6. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
Multiple Kernel Learning
Single Kernel Learning
1∑
n
f ← min
ˆ ℓ(yi , f (xi )) + C ∥f ∥Hk
f ∈Hk n
i=1
Multiple Kernel Learning (Lanckriet et al., 2004; Bach et al., 2004)
( )
∑M ∑n ∑M ∑M
ˆ=
f ˆ ← min 1
fm ℓ yi , fm (xi ) + C ∥fm ∥Hm
fm ∈Hm n
m=1 m=1i=1 m=1
(Hm : km RKHS)
Group Lasso
(Sonnenburg et al., 2006;
Rakotomamonjy et al., 2008; Suzuki & Tomioka, 2009)
. . . . . .
7. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1
L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1
. . . . . .
8. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1
L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1
Elasticnet-MKL (Tomioka & Suzuki, 2009)
( M )
∑ ∑M ∑
M
min L fm + C1 ∥fm ∥Hm + C2 ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009)
( M )
∑ ∑√
M ∑
M
min L fm + C1 ∥fm ∥2 + C2 ∥fm ∥2 m + C3
n H ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
∑n
∥f ∥2
n := 1
n i=1
2
f (xi ) .
. . . . . .
9. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1
L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1
Elasticnet-MKL (Tomioka & Suzuki, 2009)
( M )
∑ ∑M ∑
M
min L fm + C1 ∥fm ∥Hm + C2 ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009)
( M )
∑ ∑√
M ∑
M
min L fm + C1 ∥fm ∥2 + C2 ∥fm ∥2 m + C3
n H ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
∑n
∥f ∥2
n := 1
n i=1
2
f (xi ) .
. . . . . .
10. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
Mixed-Norm-Elasticnet-MKL
regression
1∑
n
L(f ) = (f (xi ) − yi )2
n
i=1
∑
M
f ∗ (x) = ∗
fm (x)(= E[Y |x])
m=1
. . . . . .
11. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
∥f − f ∗ ∥2 2
ˆ
L d ∗
d=|{m | ∥fm ∥Hm̸=0}|.
L1 -MKL (Koltchinskii & Yuan, 2008):
( )
d log(M)
1+s n − 1+s +
1−s 1
Op d
n
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009): mini-max
( ( ) 1 )
log(M) 1+s
Op d
n
Mixed-Norm-L1 -MKL (Koltchinskii & Yuan, 2010): mini-max
∑
m (C1 ∥fm ∥n + C2 ∥fm ∥Hm )
( )
d log(M)
Op dn− 1+s +
1
n
Mini-max (Raskutti et al., 2009)
( )
− 1+s
1 d log(M/d)
Op dn +
n
. . . . . .
29. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
Bach, F., Lanckriet, G., & Jordan, M. (2004). Multiple kernel learning,
conic duality, and the SMO algorithm. the 21st International
Conference on Machine Learning (pp. 41–48).
Caponnetto, A., & de Vito, E. (2007). Optimal rates for regularized
least-squares algorithm. Foundations of Computational Mathematics,
7, 331–368.
Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., M¨ller, K.-R., & Zien,
u
A. (2009). Efficient and accurate ℓp -norm multiple kernel learning.
Advances in Neural Information Processing Systems 22 (pp.
997–1005). Cambridge, MA: MIT Press.
Koltchinskii, V., & Yuan, M. (2008). Sparse recovery in large ensembles
of kernel machines. Proceedings of the Annual Conference on Learning
Theory (pp. 229–238).
Koltchinskii, V., & Yuan, M. (2010). Sparsity in multiple kernel learning.
The Annals of Statistics, 38, 3660–3695.
Lanckriet, G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M.
(2004). Learning the kernel matrix with semi-definite programming.
Journal of Machine Learning Research, 5, 27–72.
Meier, L., van de Geer, S., & B¨hlmann, P. (2009). High-dimensional
u
additive modeling. The Annals of Statistics, 37, 3779–3821. .
. . . . .
30. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
Mendelson, S., & Neeman, J. (2010). Regularization in kernel learning.
The Annals of Statistics, 38, 526–565.
Rakotomamonjy, A., Bach, F., Canu, S., & Y., G. (2008). SimpleMKL.
Journal of Machine Learning Research, 9, 2491–2521.
Raskutti, G., Wainwright, M., & Yu, B. (2009). Lower bounds on
minimax rates for nonparametric regression with additive sparsity and
smoothness. In Advances in neural information processing systems 22,
1563–1570. Cambridge, MA: MIT Press.
Sonnenburg, S., R¨tsch, G., Sch¨fer, C., & Sch¨lkopf, B. (2006). Large
a a o
scale multiple kernel learning. Journal of Machine Learning Research,
7, 1531–1565.
Steinwart, I., Hush, D., & Scovel, C. (2009). Optimal rates for
regularized least squares regression. Proceedings of the Annual
Conference on Learning Theory (pp. 79–93).
Suzuki, T., & Tomioka, R. (2009). SpicyMKL. arXiv:0909.5026.
Suzuki, T., Tomioka, R., & Sugiyama, M. (2011). Fast convergence rate
of multiple kernel learning with elastic-net regularization.
arXiv:1103.0431.
. . . . . .