SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Group and Hierarchical
Variable Selection
Hai Nguyen
Bioinformatics center, Kyoto University
hai@kuicr.kyoto-u.ac.jp
haidnguyen0909@gmail.com
Introduction
q Response: 𝑦 = (𝑦$, 𝑦&, … , 𝑦()
*
q predictors : 𝑥, = (𝑥,$, 𝑥,&, …, 𝑥,-)
*
, 𝑖 = 1, . . , 𝑛
qLinear model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥66 + 𝜀
-
68$
q2-way interaction model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝜃6: 𝑥,6 𝑥,: + 𝜀6;:
-
68$
1) ∑ 𝛽6 𝑥,6
-
68$ :	
  main	
  effect	
  term,	
   𝛽 ∈ ℝ-
2) ∑ 𝜃6: 𝑥,6 𝑥,:6;: : interaction term, 𝜃 ∈ ℝ-J-
Introduction
q Problems to be addressed in high dimensional data:
1) Predictive performance
2) Interpretability
3) Highly correlated variables
Sparsity assumption: # of nonzero coeffs 𝛽6
K
𝑠 and/or interaction 𝜃6:
K
𝑠 is very
few.
Introduction
Variable selection Group selection Hierarchical selection
LASSO GROUP	
  LASSO HIERARCHICAL	
  LASSO
Introduction
q Shrinkage methods based on regularization
𝛽M = 𝑎𝑟𝑔𝑚𝑖𝑛R 	
   𝑙 𝛽 + 𝜆 U
|𝛽|$, 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   𝐿 𝑎𝑠𝑠𝑜
||𝛽||&
&
, 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   𝑅 𝑖𝑑𝑔𝑒
Where 𝑙 𝛽 is the loss function wr.t. 𝛽, e.g., square, logistic, hinge
losses
1) Ridge: prevent overfitting but not variable selection
2) Lasso: variable selection but only select one for each group of
correlated variables.
Group selection
q Group Lasso (Yuan et al., 2006)
Coefficients are organized into K groups (known in advance):
𝑔$, 𝑔&, …, 𝑔 	
  ⊆ 1,2,… , 𝑝 , disjoint and then the Group-Lasso pelnaty:
𝜆 ∑ 𝑑:||𝛽_`
||&: , 	
  	
  	
  	
  	
  	
  	
  where ||𝛽_`
||& = ∑ 𝛽,
&
,∈_`
q Properties:
1) Group-size = 1 -> LASSO
2) Convex penalty
3) Encourage to select or remove the entire group
How	
  to	
  do	
  group	
  selection	
  without	
  prior	
  knowledge	
  of	
  group	
  structures?
Group selection: automatic feature group
q Elastic Net (Zou et al., 2005)
A linear combination of ridge and LASSO penalties for group selection
via the penalty:
	
  	
  	
  	
  	
  	
  	
   𝛼 c |𝛽6|
-
68$
+ (1 − 𝛼) c 𝛽6
&
-
68$
q Properties:
1) L1 term leads to a sparse solution
2) L2 term forces highly correlated variables to be averaged
Group selection: automatic feature group
(cont. )
q OSCAR (Bondell et al., 2008)
A combination of LASSO penalties and 𝐿e for	
  each	
  pair	
  of	
  vars
c |𝛽6|
-
68$
+ 𝑐 c max	
  {|𝛽6|, |𝛽:|}
6;:
q Properties:
1) Encourage equality of coeffs
Group selection: automatic feature group
(cont.)
q Fused LASSO (Friedman et al., 2007)
A lasso term + fused penalty
	
  	
  	
  	
  	
  	
  	
   𝛼 c |𝛽6|
-
68$
+ (1 − 𝛼) c |𝛽6 − 𝛽6o$|
-
68&
q Properties:
1) Encourage sparsity in the differences of coffs.
2) Introduced to account for 1-d correlation of predictors
Group selection: automatic feature group
(cont.)
q HORSE (Friedman et al., 2007)
Extension of fused LASSO
	
  	
  	
  	
  	
  	
  	
   𝛼 c |𝛽6|
-
68$
+ (1 − 𝛼) c |𝛽6 − 𝛽6o$|
6;:
q Properties:
1) Encourage sparsity in the differences of coffs.
2) Fused lasso for pairs of vars
Hierarchy selection
q Hierarchy restriction for interaction models
1) Strong hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 and 𝛽: ≠ 0 (SH)
2) Weak hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 or	
   𝛽: ≠ 0 (WH)
𝛽6 𝛽:
𝜃6:
𝛽s
𝜃:s
Hierarchy selection
q SHIM (Choi et al., 2010)
Simply reparameterize the coeffsof 2-way interaction model:
𝑦, = 𝛽3 + c 𝛽6 𝑥,6 + c 𝜃6: 𝑥,6 𝑥,: + 𝜀
6;:
-
68$
become: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝛾6: 𝛽6 𝛽: 𝑥,6 𝑥,: + 𝜀6;:
-
68$
q Properties:
1) satisfy “strong hierarchy”
2) but “Non-convex”, alternative minimization strategy for optimization.
Hierarchy selection
q Composite Absolute Penalties (CAP) (Zhao et al., 2009)
Use overlapping group selection to induce hierarchy selection.
Consider X1, X2. Hierarchy X1->X2 can be induced by:
𝑇 𝛽 = ||(𝛽$, 𝛽&)||vw
+ ||(𝛽&)||vx
Hierarchy selection
q Composite Absolute Penalties (Zhao et al., 2009)
Hiearchical structured sparsity for 2-way interaction model can be
obtained by:
𝑇(𝛽, 𝜃) = ∑ {|𝜃6:| + ||(𝛽6, 𝛽:, 𝜃6:)||vy`
}6z:
𝛽6 𝛽:
𝜃6:
𝛽s
𝜃:s
Hierarchy selection
q Hierarchicalinteraction LASSO (Bien et al., 2013)
Addition of convex constraints to the lasso to produce sparse interaction
models inducing hierarchicalconditions. Start with the following:
	
  	
  	
  	
  	
  	
  	
   𝑚𝑖𝑛R,{ 𝑙 𝛽, 𝜃 + 𝜆||𝛽||$ +
𝜆
2
||𝜃||$
s.t. |
𝜃 = 𝜃*
||𝜃6||$ ≤ |𝛽6|
q Properties:
1) Automatically satisfy “strong hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0	
  & 𝛽6 ≠ 0)
2) But “Non-convex”
Hierarchy selection
q Hierarchical interaction LASSO (Bien et al., 2013)
Convex relaxation: replace 𝛽 by 𝛽€
− 𝛽o
(𝛽€
, 𝛽o
≥ 0), then:
	
  	
  	
  	
  	
  	
  	
   𝑚 𝑖𝑛R‚
,Rƒ
,{ 𝑙 𝛽€
− 𝛽o
, 𝜃 + 𝜆1*
(𝛽€
+ 𝛽o
) +
𝜆
2
||𝜃||$
s.t.
𝜃 = 𝜃*
||𝜃6||$ ≤ 𝛽6
€
+ 𝛽6
o
𝛽6
€
, 𝛽6
o
≥ 0
q Properties:
1) Still satisfy “strong hierarchy” (𝜃6: ≠ 0 −> 𝛽6 ≠ 0	
  & 𝛽: ≠ 0)
2) Equivalent to : 𝜆	
   ∑ 𝑚𝑎𝑥( 𝛽6 ,|𝜃6|)
-
68$ +
„
&
||𝜃||$
3) Optimization is bit hard due to symmetry constraint, but can use AMMD
Hierarchy selection
q Hierarchicalinteraction LASSO (Bien et al., 2013)
Removing symmetry constraint, then:
	
  	
  	
  	
  	
  	
  	
   𝑚 𝑖𝑛R,{ 𝑙 𝛽€
− 𝛽o
, 𝜃 + 𝜆1*
(𝛽€
+ 𝛽o
) +
𝜆
2
||𝜃||$
s.t. …
||𝜃6||$ ≤ 𝛽6
€
+ 𝛽6
o
𝛽6
€
, 𝛽6
o
≥ 0
q Properties:
1) Now only satisfy “weak hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0	
  & 𝛽6 ≠ 0)
2) “convex”
3) Optimization is easy because of separate 𝛽6
€
+ 𝛽6
o
	
  (Proximal Operator)
Hierarchy selection
q VANISH (Zhao et al., 2009)
1) Linear model: 𝑌 = ∑ 𝛽6 𝑋6 + ∑ 𝜃6: 𝑋6 ∘ 𝑋: +6;: 𝜀
-
68$
2) Nonlinear: 𝑌 = ∑ 𝑓6 + ∑ 𝑓6: +6;: 𝜀
-
68$
3) penalty: 𝑃 𝑓 = 𝜆$ ∑ (||𝑓6||&
+ ∑ ||𝑓6:||&
:z6 )
w
x+𝜆&
-
68$
∑ ||𝑓6:||6;:
Remark: if 𝑓6 = 𝑤6 𝑋6	
  , 𝑗 = 1, … , 𝑝, and X is normalized, then penalty
becomes:
𝑃 𝑤, 𝜃 = 𝜆$ c ||(𝛽6, 𝜃6)||& + 𝜆&
-
68$
c |𝜃6:|
6;:
Hierarchy selection
q GRESH (She et al., 2013)
Proposed a general model of previously mentioned regularization of the
following form:
min
•8[R,{]
𝑙 𝛽, 𝜃 + 𝜆$|𝜃|$ + 𝜆& c ||𝛽6, 𝑧(𝜃6)||‘
-
68$
s.t. 𝜃*
= 𝜃
q Remark:
1) If 𝑧 𝜃6 = 𝜃6
*
and 𝑞 = 2, 𝑡ℎ𝑒𝑛	
  it becomes VANISH
2) If 𝑧 𝜃6 = |𝜃6|$ and 𝑞 = ∞, 𝑡ℎ𝑒𝑛	
  it becomes HiLASSO
Conclusion
• Group	
  Selection
• Hierarchical	
  selection

Mais conteúdo relacionado

Mais procurados

Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile LocaleEtude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile LocaleRaouf Alsaytara
 
近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定Kosei ABE
 
Double integration final
Double integration finalDouble integration final
Double integration finalroypark31
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesIOSR Journals
 
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed SpacesStrongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed SpacesIOSR Journals
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selectionguasoni
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodAlexander Decker
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyAmro Elfeki
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed pointAlexander Decker
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Amro Elfeki
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningJoe Suzuki
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...Joe Suzuki
 
11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappings11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappingsAlexander Decker
 
A focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappingsA focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappingsAlexander Decker
 
Some Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication AlgebrasSome Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication Algebrasijceronline
 

Mais procurados (20)

Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile LocaleEtude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
Etude Des Proprietes Physicochimiques Et Caracterisation Dune Argile Locale
 
近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定近似ベイズ計算によるベイズ推定
近似ベイズ計算によるベイズ推定
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
Double integration final
Double integration finalDouble integration final
Double integration final
 
A
AA
A
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
 
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed SpacesStrongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
Strongly Unique Best Simultaneous Coapproximation in Linear 2-Normed Spaces
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selection
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
 
Galerkin method
Galerkin methodGalerkin method
Galerkin method
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic Hydrology
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed point
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent Learning
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
 
Boolean algebra laws
Boolean algebra lawsBoolean algebra laws
Boolean algebra laws
 
11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappings11.a focus on a common fixed point theorem using weakly compatible mappings
11.a focus on a common fixed point theorem using weakly compatible mappings
 
A focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappingsA focus on a common fixed point theorem using weakly compatible mappings
A focus on a common fixed point theorem using weakly compatible mappings
 
Some Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication AlgebrasSome Other Properties of Fuzzy Filters on Lattice Implication Algebras
Some Other Properties of Fuzzy Filters on Lattice Implication Algebras
 

Semelhante a Hierarchical selection

Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmCemal Ardil
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Regularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD DetectionRegularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD Detectionkirk68
 
Support Vector Machine Classifiers
Support Vector Machine ClassifiersSupport Vector Machine Classifiers
Support Vector Machine ClassifiersAerofoil Kite
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfssusera1eccd
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic netKyusonLim
 
Optimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programmingOptimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programmingMasaki Ogura
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
 
Face verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creationFace verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creationDeep Learning Italia
 

Semelhante a Hierarchical selection (20)

Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithm
 
Modifed my_poster
Modifed my_posterModifed my_poster
Modifed my_poster
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
硕士论文
硕士论文硕士论文
硕士论文
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Deepa seminar
Deepa seminarDeepa seminar
Deepa seminar
 
Regression
RegressionRegression
Regression
 
Regularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD DetectionRegularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD Detection
 
Support Vector Machine Classifiers
Support Vector Machine ClassifiersSupport Vector Machine Classifiers
Support Vector Machine Classifiers
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
Exponential decay for the solution of the nonlinear equation induced by the m...
Exponential decay for the solution of the nonlinear equation induced by the m...Exponential decay for the solution of the nonlinear equation induced by the m...
Exponential decay for the solution of the nonlinear equation induced by the m...
 
Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...
Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...
Four Point Gauss Quadrature Runge – Kuta Method Of Order 8 For Ordinary Diffe...
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic net
 
Optimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programmingOptimization of positive linear systems via geometric programming
Optimization of positive linear systems via geometric programming
 
Linkedin_PowerPoint
Linkedin_PowerPointLinkedin_PowerPoint
Linkedin_PowerPoint
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Face verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creationFace verification techniques: how to speed up dataset creation
Face verification techniques: how to speed up dataset creation
 

Mais de Dai-Hai Nguyen

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationDai-Hai Nguyen
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionDai-Hai Nguyen
 

Mais de Dai-Hai Nguyen (8)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property prediction
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
 
Seminar
SeminarSeminar
Seminar
 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Hierarchical selection

  • 1. Group and Hierarchical Variable Selection Hai Nguyen Bioinformatics center, Kyoto University hai@kuicr.kyoto-u.ac.jp haidnguyen0909@gmail.com
  • 2. Introduction q Response: 𝑦 = (𝑦$, 𝑦&, … , 𝑦() * q predictors : 𝑥, = (𝑥,$, 𝑥,&, …, 𝑥,-) * , 𝑖 = 1, . . , 𝑛 qLinear model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥66 + 𝜀 - 68$ q2-way interaction model: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝜃6: 𝑥,6 𝑥,: + 𝜀6;: - 68$ 1) ∑ 𝛽6 𝑥,6 - 68$ :  main  effect  term,   𝛽 ∈ ℝ- 2) ∑ 𝜃6: 𝑥,6 𝑥,:6;: : interaction term, 𝜃 ∈ ℝ-J-
  • 3. Introduction q Problems to be addressed in high dimensional data: 1) Predictive performance 2) Interpretability 3) Highly correlated variables Sparsity assumption: # of nonzero coeffs 𝛽6 K 𝑠 and/or interaction 𝜃6: K 𝑠 is very few.
  • 4. Introduction Variable selection Group selection Hierarchical selection LASSO GROUP  LASSO HIERARCHICAL  LASSO
  • 5. Introduction q Shrinkage methods based on regularization 𝛽M = 𝑎𝑟𝑔𝑚𝑖𝑛R   𝑙 𝛽 + 𝜆 U |𝛽|$,                               𝐿 𝑎𝑠𝑠𝑜 ||𝛽||& & ,                               𝑅 𝑖𝑑𝑔𝑒 Where 𝑙 𝛽 is the loss function wr.t. 𝛽, e.g., square, logistic, hinge losses 1) Ridge: prevent overfitting but not variable selection 2) Lasso: variable selection but only select one for each group of correlated variables.
  • 6. Group selection q Group Lasso (Yuan et al., 2006) Coefficients are organized into K groups (known in advance): 𝑔$, 𝑔&, …, 𝑔  ⊆ 1,2,… , 𝑝 , disjoint and then the Group-Lasso pelnaty: 𝜆 ∑ 𝑑:||𝛽_` ||&: ,              where ||𝛽_` ||& = ∑ 𝛽, & ,∈_` q Properties: 1) Group-size = 1 -> LASSO 2) Convex penalty 3) Encourage to select or remove the entire group How  to  do  group  selection  without  prior  knowledge  of  group  structures?
  • 7. Group selection: automatic feature group q Elastic Net (Zou et al., 2005) A linear combination of ridge and LASSO penalties for group selection via the penalty:               𝛼 c |𝛽6| - 68$ + (1 − 𝛼) c 𝛽6 & - 68$ q Properties: 1) L1 term leads to a sparse solution 2) L2 term forces highly correlated variables to be averaged
  • 8. Group selection: automatic feature group (cont. ) q OSCAR (Bondell et al., 2008) A combination of LASSO penalties and 𝐿e for  each  pair  of  vars c |𝛽6| - 68$ + 𝑐 c max  {|𝛽6|, |𝛽:|} 6;: q Properties: 1) Encourage equality of coeffs
  • 9. Group selection: automatic feature group (cont.) q Fused LASSO (Friedman et al., 2007) A lasso term + fused penalty               𝛼 c |𝛽6| - 68$ + (1 − 𝛼) c |𝛽6 − 𝛽6o$| - 68& q Properties: 1) Encourage sparsity in the differences of coffs. 2) Introduced to account for 1-d correlation of predictors
  • 10. Group selection: automatic feature group (cont.) q HORSE (Friedman et al., 2007) Extension of fused LASSO               𝛼 c |𝛽6| - 68$ + (1 − 𝛼) c |𝛽6 − 𝛽6o$| 6;: q Properties: 1) Encourage sparsity in the differences of coffs. 2) Fused lasso for pairs of vars
  • 11. Hierarchy selection q Hierarchy restriction for interaction models 1) Strong hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 and 𝛽: ≠ 0 (SH) 2) Weak hierarchy: 𝜃6: ≠ 0 → 𝛽6 ≠ 0 or   𝛽: ≠ 0 (WH) 𝛽6 𝛽: 𝜃6: 𝛽s 𝜃:s
  • 12. Hierarchy selection q SHIM (Choi et al., 2010) Simply reparameterize the coeffsof 2-way interaction model: 𝑦, = 𝛽3 + c 𝛽6 𝑥,6 + c 𝜃6: 𝑥,6 𝑥,: + 𝜀 6;: - 68$ become: 𝑦, = 𝛽3 + ∑ 𝛽6 𝑥,6 + ∑ 𝛾6: 𝛽6 𝛽: 𝑥,6 𝑥,: + 𝜀6;: - 68$ q Properties: 1) satisfy “strong hierarchy” 2) but “Non-convex”, alternative minimization strategy for optimization.
  • 13. Hierarchy selection q Composite Absolute Penalties (CAP) (Zhao et al., 2009) Use overlapping group selection to induce hierarchy selection. Consider X1, X2. Hierarchy X1->X2 can be induced by: 𝑇 𝛽 = ||(𝛽$, 𝛽&)||vw + ||(𝛽&)||vx
  • 14. Hierarchy selection q Composite Absolute Penalties (Zhao et al., 2009) Hiearchical structured sparsity for 2-way interaction model can be obtained by: 𝑇(𝛽, 𝜃) = ∑ {|𝜃6:| + ||(𝛽6, 𝛽:, 𝜃6:)||vy` }6z: 𝛽6 𝛽: 𝜃6: 𝛽s 𝜃:s
  • 15. Hierarchy selection q Hierarchicalinteraction LASSO (Bien et al., 2013) Addition of convex constraints to the lasso to produce sparse interaction models inducing hierarchicalconditions. Start with the following:               𝑚𝑖𝑛R,{ 𝑙 𝛽, 𝜃 + 𝜆||𝛽||$ + 𝜆 2 ||𝜃||$ s.t. | 𝜃 = 𝜃* ||𝜃6||$ ≤ |𝛽6| q Properties: 1) Automatically satisfy “strong hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0  & 𝛽6 ≠ 0) 2) But “Non-convex”
  • 16. Hierarchy selection q Hierarchical interaction LASSO (Bien et al., 2013) Convex relaxation: replace 𝛽 by 𝛽€ − 𝛽o (𝛽€ , 𝛽o ≥ 0), then:               𝑚 𝑖𝑛R‚ ,Rƒ ,{ 𝑙 𝛽€ − 𝛽o , 𝜃 + 𝜆1* (𝛽€ + 𝛽o ) + 𝜆 2 ||𝜃||$ s.t. 𝜃 = 𝜃* ||𝜃6||$ ≤ 𝛽6 € + 𝛽6 o 𝛽6 € , 𝛽6 o ≥ 0 q Properties: 1) Still satisfy “strong hierarchy” (𝜃6: ≠ 0 −> 𝛽6 ≠ 0  & 𝛽: ≠ 0) 2) Equivalent to : 𝜆   ∑ 𝑚𝑎𝑥( 𝛽6 ,|𝜃6|) - 68$ + „ & ||𝜃||$ 3) Optimization is bit hard due to symmetry constraint, but can use AMMD
  • 17. Hierarchy selection q Hierarchicalinteraction LASSO (Bien et al., 2013) Removing symmetry constraint, then:               𝑚 𝑖𝑛R,{ 𝑙 𝛽€ − 𝛽o , 𝜃 + 𝜆1* (𝛽€ + 𝛽o ) + 𝜆 2 ||𝜃||$ s.t. … ||𝜃6||$ ≤ 𝛽6 € + 𝛽6 o 𝛽6 € , 𝛽6 o ≥ 0 q Properties: 1) Now only satisfy “weak hierarchy” (𝜃,6 ≠ 0 −> 𝛽, ≠ 0  & 𝛽6 ≠ 0) 2) “convex” 3) Optimization is easy because of separate 𝛽6 € + 𝛽6 o  (Proximal Operator)
  • 18. Hierarchy selection q VANISH (Zhao et al., 2009) 1) Linear model: 𝑌 = ∑ 𝛽6 𝑋6 + ∑ 𝜃6: 𝑋6 ∘ 𝑋: +6;: 𝜀 - 68$ 2) Nonlinear: 𝑌 = ∑ 𝑓6 + ∑ 𝑓6: +6;: 𝜀 - 68$ 3) penalty: 𝑃 𝑓 = 𝜆$ ∑ (||𝑓6||& + ∑ ||𝑓6:||& :z6 ) w x+𝜆& - 68$ ∑ ||𝑓6:||6;: Remark: if 𝑓6 = 𝑤6 𝑋6  , 𝑗 = 1, … , 𝑝, and X is normalized, then penalty becomes: 𝑃 𝑤, 𝜃 = 𝜆$ c ||(𝛽6, 𝜃6)||& + 𝜆& - 68$ c |𝜃6:| 6;:
  • 19. Hierarchy selection q GRESH (She et al., 2013) Proposed a general model of previously mentioned regularization of the following form: min •8[R,{] 𝑙 𝛽, 𝜃 + 𝜆$|𝜃|$ + 𝜆& c ||𝛽6, 𝑧(𝜃6)||‘ - 68$ s.t. 𝜃* = 𝜃 q Remark: 1) If 𝑧 𝜃6 = 𝜃6 * and 𝑞 = 2, 𝑡ℎ𝑒𝑛  it becomes VANISH 2) If 𝑧 𝜃6 = |𝜃6|$ and 𝑞 = ∞, 𝑡ℎ𝑒𝑛  it becomes HiLASSO
  • 20. Conclusion • Group  Selection • Hierarchical  selection