3. Contents
• Introduction to sparse modeling
• Review for OSCAR
• Proposal : OscarGKPath
• Results
• Summary and Discussion
4. Contents
• Introduction to sparse modeling
• Review for OSCAR
• Proposal : OscarGKPath
• Results
• Summary and Discussion
5. Sparse Modeling
• A machine leaning method for high-dimensional data
Genetic data, Medical image, ...
Important task : Feature selection
Sparse modeling
features which have non-zero coefficients are called “seleced”
6. Feature Selection
• Conventional feature selection
AIC/BIC w/ stepwise
Equivalent to 𝐿0-norm regularization
1-1
1
1-1
1
min
𝛽
𝑖=1
𝑙
𝑦𝑖 − 𝑥𝑖
𝑇
𝛽
2
+ 𝜆 𝛽 1min
𝛽
𝑖=1
𝑙
𝑦𝑖 − 𝑥𝑖
𝑇
𝛽
2
+ 𝜆
𝑗
𝐼 𝛽𝑗 ≠ 0
• Lasso
𝐿1-norm : convex cone of 𝐿0-norm in [-1, 1]
𝛽 𝛽
Discontinuous and non-convex Continuous and convexContinuous
Approximation
# of selected features
7. Variation of Sparse Modeling
• Lasso
• Elastic net
• SCAD
• Adaptive Lasso
• Fused Lasso
• Generalized Lasso
• (Non-overlapping/Overlapping) Group Lasso
• Clustered Lasso
OSCAR Extract Group structure
Generalization of Lasso
Respect the consistency of feature selection
Basic
8. Contents
• Introduction to sparse modeling
• Review for OSCAR
• Proposal : OscarGKPath
• Results
• Summary and Discussion
9. OSCAR(Octagonal Shrinkage and Clustering Algorithm for Regression)
• Formulation
The method of Lagrange multiplier
min
𝛽
1
2
𝑖=1
𝑙
𝑦𝑖 − 𝑥𝑖
𝑇
𝛽
2
𝑠. 𝑡. 𝛽 1 + 𝑐
𝑗>𝑘
max 𝛽𝑗 , 𝛽 𝑘 ≤ 𝑡
𝐹 𝛽, 𝜆1, 𝜆2 = min
𝛽
1
2
𝑖=1
𝑙
𝑦𝑖 − 𝑥𝑖
𝑇
𝛽
2
+ 𝜆1 𝛽 1 + 𝜆2
𝑗>𝑘
max 𝛽𝑗 , 𝛽 𝑘
where 𝑐 ≥ 0 and 𝑡 ≥ 0 are tuning parameters
where 𝜆1 ≥ 0 and 𝜆2 ≥ 0 are regularization parameters
𝐿1-norm 𝐿∞-norm
variables are normalized
and/or standardized
10. Pictorial Image
• Solutions for correlated data
ex) two features
Regularization parameters
𝜆1, 𝜆2
[Zeng and Figueired, 2013]
𝐿1-norm
𝐿∞-norm
𝐿1 + 𝐿∞
11. Lasso vs OSCAR
• OSCAR : Grouping structure is formulated
Data : Facebook Comment Dataset
17. Input parameters
• Direction to the change of 𝜆1,2
𝑑 =
𝑑1
𝑑2
Δ𝜆 = 𝑑Δ𝜂, Δ𝜂 is determined by algorithm
• Accuracy
𝜖
The proposed algorithm is approximated one
• Interval of 𝜂
𝜂, 𝜂
18. OscarGKPath : Algorithm
using optimality condition for OSCAR (abbreviation)
using “termination condition”
using the dual problem
Any solution in the
solution path can satisfy
the duality gap : proved
19. Termination condition
1. A regression coefficients become zero : Δ𝜂 𝛢
2. Order of regression coefficients change : Δ𝜂 𝑂
※Optimality condition of OSCAR is based on a given order of coefficients
3. 𝜂 reaches 𝜂 : 𝜂 − 𝜂
Δ𝜂max = min Δ𝜂 𝛢, Δ𝜂 𝑂, 𝜂 − 𝜂
20. Contents
• Introduction to sparse modeling
• Review for OSCAR
• Proposal : OscarGKPath
• Results
• Summary and Discussion
21. Setup
• Data Sets :
• 5-fold cross-validation
• direction : 𝑑 =
1
0.5
,
1
1
,
1
2
• 𝜂 : log2 𝜂 − 4 ≤ log2 𝜂 ≤ 15
OscarGKPath : 10 trials
“Batch Search” (search on 20 uniform grid linearly spaced by 0.1) ×400×5
• Duality gap : G 𝜃 𝜂 , 𝑑1 𝜂, 𝑑2 𝜂 ≤ 𝜀 = 0.1 × 𝐹 𝛽∗
, 𝑑1, 𝑑2
At one trial, only limited solution path
is produced by “Batch search”
22. Batch Search vs OscarGKPath
• Shorter time
• Maintained accuracy
Data : Right Ventricle Dataset
Grid Search
Proposal
Grid Search
Proposal
23. Contents
• Introduction to sparse modeling
• Review for OSCAR
• Proposal : OscarGKPath
• Results
• Summary and Discussion