高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)

2017/07/21@STAIR Lab AI seminar
Improving Nearest Neighbor Methods
from the Perspective of Hubness Phenomenon
Yutaro Shigeto

STAIR Lab, Chiba Institute of Technology

A complete reference list is available at
https://yutaro-s.github.io/download/ref-20170721.html

!3
Nearest neighbor methods are
•a fundamental technique

•used in various ﬁelds: NLP, CV, ML, DM

!4


!5



!6
cat
dog
gorilla


!7
cat
dog
gorilla


!8
cat
dog
gorilla


!9cat
cat
dog
gorilla

!11
The nearest neighbors of many queries are the same
objects (“hubs”)
Hubness Phenomenon
cat
[Radovanović+, 2010]

!12
Hubness Phenomenon
cat

!13
Hubness Phenomenon
cat

!14
Hubness Phenomenon
hub
cat

: Normal distribution (zero mean)
!15
Then it can be shown that
Fixed objects , with
EX [ x y2 ] EX [ x y1 ] > 0
y1 < y2
is more likely to be closer to
more likely to be a hubi.e.
Why hubs emerge?

: Normal distribution (zero mean)
!16
EX [ x y2 ] EX [ x y1 ] > 0
y1 < y2
more likely to be a hubi.e.
Because this holds for any pair and ,
objects closest to the origin tend to be hubs

This bias is called “spatial centrality”
Why hubs emerge?

Variants
•Squared Euclidean distance [Shigeto+, 2015]

•Inner product [Suzuki+, 2013]
!17
EX x y2
2
EX x y1
2
> 0
1
|D|
x D
x, y2
1
|D|
x D
x, y1 < 0

!18
Research Objective:
Improve the performance of nearest neighbor
methods via reducing the emergence of hubs
Problem:
The emergence of hubs diminishes nearest
neighbor methods

!20
[Suzuki+, 2013]
Centering: Reducing spatial centrality
Spatial centrality implies the object which is similar to
the centroid tends to be hub
After centering, similarities are identical: i.e., zero
centroid
tends to be hub

!21
Centering: Reducing spatial centrality
Spatial centrality implies the object which is similar to
the centroid tends to be hub
After centering, similarities are identical: i.e., zero
centroid = origin
[Suzuki+, 2013]

Mutual proximity: Breaking asymmetric relation
!22
[Schnitzer+, 2012]
Although hub becomes the nearest neighbor of many
objects, such objects can not become the nearest
neighbor of hub
Mutual proximity makes neighbor relations symmetric
hub

!23
neighbor of hub
hub
[Schnitzer+, 2012]

!24
neighbor of hub
hub
[Schnitzer+, 2012]

Zero-Shot Learning
[Shigeto+, 2015]

Zero-shot learning
Active research topic in NLP, CV, ML

Many applications:
•Image labeling

•Bilingual lexicon extraction

+ Many other cross-domain matching tasks
!26
[Larochelle+, 2008]

…but classifier has to predict

labels not appearing in training set
ZSL is a type of multi-class classification
!27
ZSL task
Standard classification task

!28
Pre-processing: Label embedding
Labels are embedded in metric space
Objects and labels = both vectors
label space
lion
tigerexample space
chimpanzee

Find a matrix M that projects examples into label space
!29
chimpanzee
lion
tigerlabel spaceexample space
M
Training: ﬁnd a projection function

lion
tigerlabel spaceexample space label spaceexample space
chimpanzee
leopard
gorilla
!30
Prediction: Nearest neighbor search
Given test object and test labels,

to predict the label of a test object,

1. project the example into label space, using matrix M
2. ﬁnd the nearest label
Prediction: Nearest neighbor search
Given test object and test labels,

to predict the label of a test object,
lion
tigerlabel spaceexample space label spaceexample space
chimpanzee
leopard
gorilla
M
!31

Hubness: Problem in ZSL
!32
sheep
zebra
hippo
rat
label spaceexample space
Classiﬁer frequently predicts the same labels (“hubs”)
[Dinu and Baroni, 2015; see also Radovanović+, 2010]

!33
sheep
zebra
hippo
rat

!34
sheep
zebra
hippo
rat

!35
sheep
zebra
hippo
rat

!36
sheep
zebra
hippo
rat

!37
Problem with current regression approach:
Research objective:
Learned classiﬁer frequently predicts the same labels

(Emergence of “hub” labels)
Investigate why hubs emerge in regression-based ZSL,
and how to reduce the emergence of hubs

Current approach:
!39
Proposed approach:
chimpanzee
lion
M
example space
chimpanzee
lion
tigerlabel space
M

Current approach:
Proposed approach:
chimpanzee
lion
M
example space
chimpanzee
lion
tigerlabel space
label spaceexample space !40
leopard
gorilla
M
leopard
gorillaM

Synthetic data result
!41
Hubness

(N1 skewness)
Accuracy
24.2
13.8
0.5
87.6
Current Proposed
Proposed approach reduces hubness

and improves accuracy

Why proposed approach reduces hubness
Shrinkage
in regression
!42
Argument for our proposal relies on two concepts
Spatial centrality
of data distributions

!43
If we optimize
Then, we have
“Shrinkage” in ridge/least squares regression
[See also Lazaridou+,2015]

!44
If we optimize
Then, we have
For simplicity, projected objects are assumed to also follow normal distribution
“Shrinkage” in ridge/least squares regression
[See also Lazaridou+,2015]

Shrinkage
in regression
!45
Spatial centrality
✔

“Spatial centrality”
: query distribution (zero mean)
!46
[See also Radovanović+, 2010]

more likely to be a hub
!47
i.e.
EX x y2
2
EX x y1
2
> 0

more likely to be a hub
!48
i.e.
Because this holds for any pair and ,
objects closest to the origin tend to be hubs

This bias is called “spatial centrality.”
EX x y2
2
EX x y1
2
> 0

Degree of spatial centrality
!49
Further assume distribution of

and

Degree of spatial centrality
!50
Further assume distribution of

and
This formula quantiﬁes the degree of spatial centrality
We have

The smaller the variance of label distribution, the
smaller the spatial centrality (= bias causing hubness)
Spatial centrality depends on variance of
label distributions
!51

Shrinkage
in regression
!52
Spatial centrality
✔ ✔

!53
Current approach: map X into Y
Proposed approach: map Y into X

!54
Shrinkage

!55

!56

Q. Which conﬁguration is better for reducing hubs?
!57
Proposed Current

!58
Proposed Current逆方向順方向
Spatial centrality
For a ﬁxed query distribution ,
data distribution with smaller variance is
preferable to reduce hubs

!59
Proposed Current

!60
Proposed Current
Since distribution is not ﬁxed,
comparing label distribution is not meaningful

!61
Proposed (scaled) Current
Scaling does not change the nearest neighbor relation

!62
Proposed (scaled) Current
A. Reverse direction is preferable
For ﬁxed distribution ,
variance of distribution in proposed is smaller

Summary of our proposal
!63
Project labels into example space

➥ reduces variance of labels,
hence suppresses hubness
chimpanzee
gorilla
example space label space
Label distribution with smaller variance is
desirable to reduce hubness
Spatial centrality
Regression shrinks variance of projected
objects
Shrinkage
Proposal

Experimental objective
!65
We evaluate proposed approach in real tasks
•Does it suppress hubs?

•Does it improve the prediction accuracy?

•Bilingual lexicon extraction
gorilla
leopard
: source language : target language
gorille
Zero-shot tasks
!66
•Image labeling
gorilla
leopard
: image : label

Compared methods
!67
Current Proposed CCA
We used Euclidean distance as a distance measure

for ﬁnding the nearest label

2.00
0.08
2.61
better
Hubness (skewness)
!68
9.2
41.3
22.6
current reverse CCA
better
Accuracy [%]
Image labeling

10.0
37.7
3.85.2
65.162.1
better
Hubness (skewness)
Ja → En En → Ja
Bilingual lexicon extraction: Ja - En
!69
21.620.2
34.431.9
0.40.2
current reverse CCA
better
Accuracy [%]
Ja → En En → Ja

Summary
• Analyzed why hubs emerge in current ZSL approach

- Variance of labels greater than examples

• Proposed a simple method for reducing hubness

- Reverse the mapping direction

• Proposed method reduced hubness and
outperformed current approach and CCA in image
labeling and bilingual lexicon extraction tasks
!70

k-Nearest Neighbor Classiﬁcation
[Shigeto+, 2016]

k-nearest neighbor classiﬁcation
!72
Given a dataset D = {(xi, yi)}n
i=1
the label of is decided by its k-nearest neighbors:x
ˆy = arg min
yi:(xi,yi) D
f(x, xi)

!73
i=1
ˆy = arg min
yi:(xi,yi) D
f(x, xi)

!74
i=1
ˆy = arg min
yi:(xi,yi) D
f(x, xi)

!75
i=1
ˆy = arg min
yi:(xi,yi) D
f(x, xi)
Distance metric learning learns a matrix
f(x, xi) = Lx Lxi
L
Training is computationally expensive

!76
i=1
ˆy = arg min
yi:(xi,yi) D
f(x, xi)
Proposal: Dissimilarity

!77
i=1
ˆy = arg min
yi:(xi,yi) D
f(x, xi)
Spatial centrality
For a ﬁxed query distribution ,
data distribution with smaller variance is
preferable to reduce hubs

!78
i=1
ˆy = arg min
yi:(xi,yi) D
f(x, xi)
The function f needs to be computed only
between labeled objects and unlabeled object

➡labeled objects are always target of retrieval,

and unlabeled object is always query
f(x, xi) = x Wxi
2

This method is not metric learning
!79
•The goal of classiﬁcation is to classify the query
correctly

-ﬁnding a suitable decision boundary (not metric)

!80
min
W
n
i=1 z Ti
xi Wz 2
+ W 2
F
Find a matrix which minimizes the distance:
Proposal: Training
W

!81
min
W
n
i=1 z Ti
xi Wz 2
+ W 2
F
Proposal: Training
Find a matrix which minimizes the distance:W

!82
min
W
n
i=1 z Ti
xi Wz 2
+ W 2
F
Proposal: Training

!83
min
W
n
i=1 z Ti
xi Wz 2
+ W 2
F
Proposal: Training
W = XJXT
(XXT
+ I) 1
This function has the closed-form solution:

!84
Givne a query object ,
ˆy = arg min
yi:(xi,yi) D
x Wxi
2
Proposal: Test
x

!85
ˆy = arg min
yi:(xi,yi) D
x Wxi
2
Proposal: Test
Givne a query object ,x

Move labeled objects v.s. move query
!86
f(x, xi) = Mx xi
2
f(x, xi) = x Wxi
2
•Move labeled objects (proposal)
•Move query
This reduces the variance

= reducing the emergence of hubs
This increases the variance

= promoting the emergence of hubs

Experimental objective
!88
We evaluate the proposed method on various datasets

Our main focuses are
-Does it suppress hubs?

-Does it improve the classiﬁcation accuracy?

-Is it faster than distance metric learning?

Results: Skewness (degree of hubness)
!89
The proposed method

-reduces the emergence of hubs

method RCV News Reuters TDT
original metric 13.35 21.93 7.61 4.89
LMNN 3.86 14.74 7.63 4.01
ITML 4.27 19.65 7.30 2.39
DML-eig 1.71 1.45 3.05 1.34
Move-labeled (proposed) 1.14 2.88 4.53 1.44
Move-query 21.57 33.36 17.49 6.71
(c) Image datasets.
method AwA CUB SUN aPY
original metric 2.49 2.38 2.52 2.80
LMNN 3.10 2.96 2.80 3.94
ITML 2.42 2.27 2.37 2.69
DML-eig 1.90 1.77 2.39 2.17
Move-query 7.81 7.83 7.48 11.65
Image datasets (smaller is better)

Results: Classiﬁcation accuracy [%]
!90
Image datasets
The proposed method


-is better than metric learning methods on most datasets

method RCV News Reuters TDT
LMNN 94.7 79.9 91.5 96.6
ITML 93.2 77.0 90.8 96.5
DML-eig 94.5 73.3 85.9 95.7
Move-query 89.1 70.0 85.9 95.4
(c) Image datasets.
LMNN 83.0 54.7 24.4 81.8
ITML 83.1 51.3 26.0 82.4
DML-eig 82.0 53.5 22.4 81.6
Move-query 79.2 43.3 14.6 78.7

Results: Training time [s]
!91
The proposed method



-is faster than … on all datasets
Document datasetsndicate the best performer for each dataset.
(b) Image datasets.
LMNN 1525.5 1098.2 15704.3 317.3
ITML 1536.3 577.6 1126.4 9211.2
DML-eig 2048.0 2084.7 2006.1 1787.1
proposed 9.5 1.5 4.1 6.4

Results: UCI datasets
!92
The proposed method




-does not work well on UCI datasets
able 3: Classiﬁcation accuracy [%]: Bold ﬁgures indicate the best performers for each
ataset.
(a) UCI datasets.
method ionosphere balance-scale iris wine glass
original metric 86.8 89.5 97.2 98.1 68.1
LMNN 90.3 90.0 96.7 98.1 67.7
ITML 87.7 89.5 97.8 99.1 65.0
DML-eig 87.7 91.2 96.7 98.6 66.5
Move-labeled (proposed) 89.6 89.5 97.2 98.6 70.8
Move-query 79.7 89.4 97.2 96.3 62.3
(b) Document datasets.

Summary
!93
Prediction:
ˆy = arg min
yi:(xi,yi) D
x Wxi
2
The proposed method




-does not work well on UCI datasets

Other topics
• Normalization of distances

- Local scaling [Schnitzer+, 2012], Laplacian-based kernel [Suzuki+, 2012],
Localized centering [Hara+, 2015]

• Classiﬁers

-hw-kNN [Radovanović+, 2009], h-FNN [Tomašev+, 2013], NHBNN
[Tomašev+, 2011]
!95See comprehensive survey [Tomašev+, 2015; Suzuki, 2014; Radovanović, 2017]

Tools
• Hub miner: Hubness-aware machine learning

• Hub toolbox

• PyHubs

• Our code
!96

Conclusions
• Introduced why hubs emerge

-Spatial centrality

• Showed hub reduction methods which improved the
performance of nearest neighbor methods
!97

高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a 高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)

Semelhante a 高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー) (20)

Mais de STAIR Lab, Chiba Institute of Technology

Mais de STAIR Lab, Chiba Institute of Technology (7)

Último

Último (20)

高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)