Jokyokai2

Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .

.
.
Fast Convergence Rate of
Multiple Kernel Learning
with Elastic-Net Regularization
.
.. .

.
† † ‡

†

‡

2011 4 25

. . . . . .

. . . . . . . . . . . . . . . .

Outline

.
. . Introduction
1
MKL

.
. . Mixed-Norm-Elasticnet-MKL
2

Mixed-Elasticnet-MKL

.
. . Mini-max
3

.
. . Lp -MKL
4

.
. . Conclusion
5

. . . . . .

. . . . . . . . . . . . . . . .
MKL

(RKHS)

k(x, x ′ ) ⇔ Hk

1∑
n
f ← min
ˆ ℓ(yi , f (xi )) + C ∥f ∥Hk
f ∈Hk n
i=1

∑
n
∃αi ∈ R s.t. ˆ
f (x) = αi k(xi , x)
i=1

. . . . . .

. . . . . . . . . . . . . . . .
MKL

Challenge
, , ,

Multiple Kernel Leaning

. . . . . .

. . . . . . . . . . . . . . . .
MKL

Multiple Kernel Learning

Single Kernel Learning

1∑
n
f ← min
ˆ ℓ(yi , f (xi )) + C ∥f ∥Hk
f ∈Hk n
i=1

Multiple Kernel Learning (Lanckriet et al., 2004; Bach et al., 2004)
( )
∑M ∑n ∑M ∑M
ˆ=
f ˆ ← min 1
fm ℓ yi , fm (xi ) + C ∥fm ∥Hm
fm ∈Hm n
m=1 m=1i=1 m=1

(Hm : km RKHS)
Group Lasso

(Sonnenburg et al., 2006;
Rakotomamonjy et al., 2008; Suzuki & Tomioka, 2009)

. . . . . .

. . . . . . . . . . . . . . . .
MKL

L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1

L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1

. . . . . .

. . . . . . . . . . . . . . . .
MKL

L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1

L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1

Elasticnet-MKL (Tomioka & Suzuki, 2009)
( M )
∑ ∑M ∑
M
min L fm + C1 ∥fm ∥Hm + C2 ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1

Mixed-Norm-Elasticnet-MKL (Meier et al., 2009)
( M )
∑ ∑√
M ∑
M
min L fm + C1 ∥fm ∥2 + C2 ∥fm ∥2 m + C3
n H ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
∑n
∥f ∥2
n := 1
n i=1
2
f (xi ) .
. . . . . .

. . . . . . . . . . . . . . . .

Mixed-Norm-Elasticnet-MKL

regression

1∑
n
L(f ) = (f (xi ) − yi )2
n
i=1

∑
M
f ∗ (x) = ∗
fm (x)(= E[Y |x])
m=1

. . . . . .

. . . . . . . . . . . . . . . .

∥f − f ∗ ∥2 2
ˆ
L d ∗
d=|{m | ∥fm ∥Hm̸=0}|.
L1 -MKL (Koltchinskii & Yuan, 2008):
( )
d log(M)
1+s n − 1+s +
1−s 1
Op d
n
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009): mini-max
( ( ) 1 )
log(M) 1+s
Op d
n
Mixed-Norm-L1 -MKL (Koltchinskii & Yuan, 2010): mini-max
∑
m (C1 ∥fm ∥n + C2 ∥fm ∥Hm )
( )
d log(M)
Op dn− 1+s +
1

n
Mini-max (Raskutti et al., 2009)
( )
− 1+s
1 d log(M/d)
Op dn +
n
. . . . . .

. . . . . . . . . . . . . . . .

( )
1+q 1+q 2s
d log(M)
∥f − f ∗ ∥2 2 = Op d 1+q+s n− 1+q+s R21+q+s +
ˆ L .
n

f∗ q
f∗ “ ”R2
ℓ2 mini-max ℓ∞

. . . . . .

. . . . . . . . . . . . . . . .

(q)
1−s 1
K&Y (2008) q=1 ? d 1+s n− 1+s + d log(M)
( ) 1 n
log(M) 1+s
Meier et al. (2009) q=0 d n
1
K&Y (2010) q=0 ℓ∞ -ball dn− 1+s +
d log(M)
n
1+q
− 1+q+s
IBIS2010 0≤q≤1 ℓ∞ -ball dn + d log(M)
n
( d ) 1+q+s 1+q+s
1+q 2s
0≤q≤1 ℓ2 -ball n
R2 + d log(M)
n

. . . . . .

. . . . . . . . . . . . . . . .

∗
I0 := {m | ∥fm ∥Hm ̸= 0}

∗
∥fm ∥Hm > 0 (m ∈ I0 ),
∗
∥fm ∥Hm = 0 (m ∈ I0 ).
c

d = |I0 | ( )

. . . . . .

. . . . . . . . . . . . . . . .

Spectrum Condition (s)
0 < s < 1:

Mercer
∑∞
km (x, x ′ ) = ℓ=1 µℓ,m ϕℓ,m (x)ϕℓ,m (x ′ )
{ϕℓ,m }∞
L2 (P)
ℓ=1 ONS.
.
Spectrum Condition (s) .
..
0<s<1

µℓ,m ≤ C ℓ− s
1

. (∀ℓ, m).
.. .

.
s RKHS
s s
.
Proposition (Steinwart et al. (2009)) .
..
µℓ,m ∼ ℓ− s ⇔ N(B(Hm ), ϵ, L2 (P)) ∼ ϵ−2s
1

.
.. .

.
. . . . . .

. . . . . . . . . . . . . . . .

Convolution Condition (q)

0 ≤ q ≤ 1: f∗

Σm : Hm → Hm ⟨f , Σm g ⟩Hm := E[f (X )g (X )]
.
Convolution Condition (q) (Caponnetto & de Vito, 2007) .
..
∗
0 ≤ q ≤ 1 gm ∈ Hm
∗ ∗
fm = Σq/2 gm
m

.
.. .

.
∑∞ q/2
km (x, x ′ ) := ℓ=1 µℓ,m ϕℓ,m (x)ϕℓ,m (x ′ )
(q/2)

∫
∗
fm (x) = km (x, x ′ )gm (x ′ )dP(x ′ ),
(q/2) ∗

. . . . . .

. . . . . . . . . . . . . . . .

s q

f*
f* f*

(a) s q=0 (b) s q>0 (c) s q>0

. . . . . .

. . . . . . . . . . . . . . . .

Incoherece Condition

.
Incoherece Condition (Koltchinskii & Yuan, 2008; Meier et al., 2009) .
..
0<C

. 0 < C < κ(I0 )(1 − ρ2 (I0 )).
.. .

.
{ ∑ }
∥ m∈I fm ∥2 2
κ(I ) := sup κ ≥ 0 | κ ≤ ∑ L
2 , ∀fm ∈ Hm (m ∈ I ) ,
m∈I ∥fm ∥L2
{ }
⟨fI , gI c ⟩L2
ρ(I ) := sup | fI ∈ HI , gI c ∈ HI c , fI ̸= 0, gI c ̸= 0 .
∥fI ∥L2 ∥gI c ∥L2

I0 .

. . . . . .

. . . . . . . . . . . . . . . .

.
Basic Condition .
.. ∑M
E[Y |X ] = f ∗ (X ) = m=1 fm (X )
∗
ϵ := Y − f (X ) ∗

|ϵ| ≤ L.
. supX ∈X |km (X , X )| ≤ 1 (∀m).
.. .

.
.
∞-norm Bound Condition .
..
Spectrum Condition (s)

∥fm ∥∞ ≤ C ∥fm ∥1−s ∥fm ∥s m .
L2 (P) H
.
.. .

.
Gaussian Sobolev
Mendelson and Neeman (2010); Steinwart et al. (2009)

. . . . . .

. . . . . . . . . . . . . . . .

( )
∑
M
(n)
∑√
M
(n) (n)
∑
M
min L fm + λ1 ∥fm ∥2 + λ2 ∥fm ∥2 m + λ3
n H ∥fm ∥2 m .
H
fm ∈Hm
m=1 m=1 m=1

.
Theorem (Suzuki et al. (2011)) .
..
Spectrum Condition (s), Convolution Condition (q), Incoherence
Condition, Basic Condition, ∞-norm Bound Condition
(n) (n) (n)
n λ1 , λ2 , λ3
( )
1+q 1+q 2s
d log(M)
∥f − f ∗ ∥2 2 ≤ C ′ d 1+q+s n− 1+q+s R2,g ∗ +
ˆ L
1+q+s
η(t)2 ,
n
√ √
. 1 − e− nt
− e− n
(∀t ≥ 1)
.. .

.
√ √
η(t) := max( t, t/ n) R2,g ∗ :
( )1
∑
M
∗
2

R 2,g ∗ := ∥gm ∥2 m
H .
m=1
. . . . . .

. . . . . . . . . . . . . . . .

Bound

q=0

dn− 1+s +
1 d log(M)
Koltchinskii and Yuan (2010) n .
1+q 1+q 2s
d 1+q+s n− 1+q+s R2,g ∗ +
1+q+s d log(M)
n .

...
1 ∗
∥fm ∥Hm = 1 (m = 1, . . . , d):
dn− 1+s + d log(M)
1

n
Koltchinskii and Yuan (2010)
...
2 ∥fm ∥Hm = m−1 (m = 1, . . . , d):
∗

d 1+s n− 1+s + d log(M)
1 1

n s
Koltchinskii and Yuan (2010) d 1+s

(s = 0)

. . . . . .

. . . . . . . . . . . . . . . .

Mini-max

Mini-max
q
∗ ∗
fm = Σm gm
2

(∑ )1
1
.
.. M
m=1
∗
∥gm ∥2 m
H
2
≤ R2 g∗ R2 ℓ2

1+q 1+q 2s
d log(M/d)
d 1+q+s n− 1+q+s R21+q+s +
n

...
2 ∗
maxm ∥gm ∥Hm ≤ R∞ g∗ R∞ ℓ∞

1+q 2s
d log(M/d)
dn− 1+q+s R∞ +
1+q+s

n
q = 0, R∞ = 1 Koltchinskii and Yuan (2010)

. . . . . .

. . . . . . . . . . . . . . . .

Lp -MKL
Lp -MKL (Kloft et al., 2009)
( M )
∑ (n)
∑
M
min L fm + λ1 ∥fm ∥p m
H
fm ∈Hm
m=1 m=1

√ t (∑ )p
1
M ∗
η(t) := max( t, √n ), Rp := m=1 ∥fm ∥p m
H
.
Theorem (Lp -MKL ) .
..
Spectrum Condition(s), Incoherence Condition, Basic Condition, ∞-norm
2s
1− − 1+s
2
λ1 = n− 1+s M
(n) 1 p(1+s)
Bound Condition Rp
( )
2s 2s M log(M)
∥f − f ∗ ∥2 2 ≤ C n− 1+s M 1− p(1+s) Rp1+s +
1
ˆ L η(t)2 ,
n
√
. 1 − exp(−t) − exp(− n)
.. .

.
Mini-max . . . . . .

. . . . . . . . . . . . . . . .

Conclusion

Mixed-Norm-Elasticnet–MKL

f∗ q
ℓ2 mini-max
Lp -MKL

arXiv http://arxiv.org/abs/1103.0431
slide: http://www.simplex.t.u-tokyo.ac.jp/˜s-taiji/data/IBISML2011.pdf

. . . . . .

. . . . . . . . . . . . . . . .

Bach, F., Lanckriet, G., & Jordan, M. (2004). Multiple kernel learning,
conic duality, and the SMO algorithm. the 21st International
Conference on Machine Learning (pp. 41–48).
Caponnetto, A., & de Vito, E. (2007). Optimal rates for regularized
least-squares algorithm. Foundations of Computational Mathematics,
7, 331–368.
Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., M¨ller, K.-R., & Zien,
u
A. (2009). Eﬃcient and accurate ℓp -norm multiple kernel learning.
Advances in Neural Information Processing Systems 22 (pp.
997–1005). Cambridge, MA: MIT Press.
Koltchinskii, V., & Yuan, M. (2008). Sparse recovery in large ensembles
of kernel machines. Proceedings of the Annual Conference on Learning
Theory (pp. 229–238).
Koltchinskii, V., & Yuan, M. (2010). Sparsity in multiple kernel learning.
The Annals of Statistics, 38, 3660–3695.
Lanckriet, G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M.
(2004). Learning the kernel matrix with semi-deﬁnite programming.
Journal of Machine Learning Research, 5, 27–72.
Meier, L., van de Geer, S., & B¨hlmann, P. (2009). High-dimensional
u
additive modeling. The Annals of Statistics, 37, 3779–3821. .
. . . . .

. . . . . . . . . . . . . . . .

Mendelson, S., & Neeman, J. (2010). Regularization in kernel learning.
The Annals of Statistics, 38, 526–565.
Rakotomamonjy, A., Bach, F., Canu, S., & Y., G. (2008). SimpleMKL.
Journal of Machine Learning Research, 9, 2491–2521.
Raskutti, G., Wainwright, M., & Yu, B. (2009). Lower bounds on
minimax rates for nonparametric regression with additive sparsity and
smoothness. In Advances in neural information processing systems 22,
1563–1570. Cambridge, MA: MIT Press.
Sonnenburg, S., R¨tsch, G., Sch¨fer, C., & Sch¨lkopf, B. (2006). Large
a a o
scale multiple kernel learning. Journal of Machine Learning Research,
7, 1531–1565.
Steinwart, I., Hush, D., & Scovel, C. (2009). Optimal rates for
regularized least squares regression. Proceedings of the Annual
Conference on Learning Theory (pp. 79–93).
Suzuki, T., & Tomioka, R. (2009). SpicyMKL. arXiv:0909.5026.
Suzuki, T., Tomioka, R., & Sugiyama, M. (2011). Fast convergence rate
of multiple kernel learning with elastic-net regularization.
arXiv:1103.0431.
. . . . . .

. . . . . . . . . . . . . . . .

Tomioka, R., & Suzuki, T. (2009). Sparsity-accuracy trade-oﬀ in MKL.
NIPS 2009 Workshop:: Understanding Multiple Kernel Learning
Methods. Whistler. arXiv:1001.2615.

. . . . . .

Jokyokai2

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Jokyokai2

Semelhante a Jokyokai2 (7)

Mais de Taiji Suzuki

Mais de Taiji Suzuki (7)

Jokyokai2