7. (xi)i=1,...,n ,
Y = (Yik)i=1,...,n, k=1,...,r
‣ (1)
‣ (2) 2
Aij = exp(− xi − xj /t)
‣ (3)
n ( , )
1 2
min Yi − Yj Aij Yi: Y i
Y 2
i,j=1
(1)
n×r
s.t. Y ∈ R , 1 Y = 0, Y Y = nIr×r .
(2) (3)
(1 : 1 )
7
8. G A , D = diag(A1)
,L=D-A G
, A
.
n
1 2
Y i − Yj Aij = tr(Y LY).
2
i,j=1
8
9. L
min tr(Y LY)
Y
(2) (3)
s.t. Y ∈ Rn×r , 1 Y = 0, Y Y = nIr×r .
‣ (3) ,L
r (cf. )
‣ 1 (2)
‣2 1 , (2)
‣2
9
10. ‣A L
‣ A
‣ m (≪ n) u1, …, um
d
∈R
‣
-
- n=69,000
m=300
‣ s
( : s=2) 10
11. ‣2
‣
-
n×m
Z∈R
-
-Z ( s/m)
h(xi ,uj )
h(xi ,uj ) , ∀j ∈ i
Zij = j ∈ i
0, otherwise.
( i : xi s , h: ) 11
12. ‣2
,
‣ Λ = diag(1 Z) ∈ R
T m×m
,
^ = ZΛ−1 Z .
A
‣
12
13. ‣ ^
A
- ( 0)
- m
-( ) . ^
L=I- A
,L ^
A
- ( )
^
‣ A
- ,
13
14. ‣ ^ = ZΛ-1/2Λ-1/2ZT
A
‣ M = Λ Z ZΛ
-1/2 T -1/2
∈R
m×m
‣ ZΛ -1/2
= UΣ
1/2 T
V :
n×m m×m m×m
( U∈R Σ∈R V∈R )
‣
^ = UΣ1/2 V VΣ1/2 U = UΣU ,
A
M = VΣ1/2 U UΣ1/2 V = VΣV .
‣ U = ZΛ
-1/2
VΣ
-1/2
14
‣U r Y
15. ‣Σ 1, σ1, …, σr, …
m
σ1, …, σr V v1, …, vr ∈ R
‣ Σr = diag(σ1, …, σr) Vr = [v1, …, vr]
‣ W
√ −1/2 −1/2 m×r
W = nΛ Vr Σr ∈R
‣ Y
Y = ZW.
15
16. Nyström method
‣ ,
‣ n→∞
-
-
‣ ,n k
φn, n
k
1 ^
φn,k (x) = A(x, xi )Yik .
σk
i=1
^
A ():
16
17. AGH Nyström
n
1 ^
φn,k (x) = A(x, xi )Yik .
σk
i=1
‣
‣
φn,k (x) = wk z(x). z: x
-
- O(dm)
17
21. A. Andoni and P. Indyk. Near-optimal hashing algorithms for
approximate nearest neighbor in high dimensions. Proceedings of
FOCS, 2006.
Y. Bengio, O. Delalleau, N. Le Roux, and J.-F. Paiement. Learning
eigenfunctions links spectral embedding and kernel pca. Neural
Computation, 2004.
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high
dimensions via hashing. Proceedings of VLDB, 1999.
P. Indyk and R. Motwani. Approximate nearest neighbor: Towards
removing the curse of dimensionality. Proceedings of STOC, 1998.
B. Kulis and T. Darrell. Learning to hash with binary reconstructive
embeddings. NIPS 22, 2010.
B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for
scalable image search. Proceedings of ICCV, 2009. 21
22. W. Liu, J. He, and S.-F. Chang. Large graph construction for scalable
semi-supervised learning. Proceedings of ICML, 2010.
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs.
ICML, 2011.
M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from
shift-invariant kernels. NIPS 22, 2010.
J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for
hashing with compact codes. Proceedings of ICML, 2010.
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS 21, 2009.
C. Williams and M. Seeger. The effect of the input density distribution
on kernel-based classifiers. Proceedings of ICML, 2000.
22
SH も同じ問題\n
G のノード上で定義された関数に作用する,離散的なラプラス作用素,みたいなもの\n
O(dm) の d は,疎ベクトルなら非ゼロ成分数で済む\n
MAP: Mean Average Precision\n ランキングの上から見て,正しいラベルがつけられた k 番目のデータを n 位にしていたときにそのデータのスコアを k/n として,平均をとったもの\n