This document proposes a semi-supervised concept detection approach based on graph structure features (GSF) extracted from image similarity graphs. GSF represents images as vectors based on eigenvectors of the graph Laplacian. Two incremental learning schemes are developed to address computational issues. Experiments on synthetic and MIR-Flickr datasets show the approach achieves performance comparable or better than state-of-the-art methods, and benefits from adding unlabeled data. The approach provides an efficient and scalable solution for concept detection in large multimedia collections.
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Semi-supervised concept detection by learning the structure of similarity graphs
1. Semi-supervised concept detection by learning
the structure of similarity graphs
Symeon Papadopoulos1, Christos Sagonas1, Ioannis Kompatsiaris1, Athena Vakali2
1
Centre for Research and Technology Hellas, Information Technologies Institute
2
Aristotle University of Thessaloniki, Informatics Department
19th International Conference on Multimedia Modeling
Huangshan, China, Jan 7-9, 2012
2. IMAGE TAGS CONCEPTS
chocolate
cake food
chocolateganachebuttercream
shamsd
female
N/A indoor
people
portrait
nature
landscape clouds
water lake
reflection sky
mirror water
flickrelite
abigfave
SOURCE: MIR-Flickr
mklab.iti.gr #2
3. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #3
4. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #4
5. Concept detection
ML perspective
• Given an image, produce a set of relevant concepts
IR perspective
• Given an image collection and a concept of interest,
rank all images in order of relevance.
mklab.iti.gr #5
6. Semi-supervised learning
• Transductive learning setting
target concepts
annotated set
D-dimensional feature vector from image i
concept indicator vector (labels) for image i
set of unknown items
Predict concepts associated with items of by processing
together and .
mklab.iti.gr #6
7. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #7
8. Related work
• Neighborhood similarity (Wang et al., 2009)
– Uses image similarity graphs in combination with graph-based SSL
(Zhu, 2005; Zhou et al., 2004) – Not incremental
• Sparse similarity graph by convex optim. (Tang et al., 2009)
– Applicable to online settings - Computationally intensive training step
• Hashing-based graph construction (Chen et al., 2010)
– Uses KL divergence multi-label propagation, but relies on iterative
computational scheme – Difficult to apply in incremental settings
• Social dimensions (Tang & Liu, 2011)
– Uses LEs for networked classification problems (i.e. when network
between nodes is explicit) – Not incremental, not applied to
multimedia
mklab.iti.gr #8
9. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #9
11. Graph construction
image similarity graph
set of nodes-images
cardinality of node set
Construction options
• full weighted graph
• kNN graph (connect k most similar images)
• εNN graph (connect images < similarity threshold)
mklab.iti.gr #11
12. Eigenvector/value computation
Normalized graph Laplacian
degree matrix (diagonal)
adjacency matrix
(typical form of graph Laplacian: )
non-zero eigenvalues
graph structure features*
by solving
*aka Laplacian Eigenmaps
mklab.iti.gr #12
13. Graph structure feature learning
• Each media item is represented by a vector
• At this point, any supervised learning method could be used.
[note that the whole framework is still SSL since unlabeled items are
used during graph construction]
• SVM is selected
– good performance in several problems
– good implementations available (LibSVM, LIBLINEAR)
– real-valued output (IR perspective rank images by concept)
mklab.iti.gr #13
14. Intuition
coast coast, person
coast
0.2415 -0.4552
coast, person coast, person
0.3077 coast
-0.0893
-0.4552
0.2748
0.3144 -0.4663
coast 0.2415
coast coast, person
2nd eigenvector of graph Laplacian
mklab.iti.gr #14
15. Incremental learning setting (1)
• Transductive learning setting often impractical. For
each new set of unlabeled items:
1. recompute image similarity matrix
2. recompute graph structure features (LEs)
3. use SVM to obtain prediction scores
• Step 2 is computationally expensive.
• Devise two incremental schemes:
– Linear Projection (LP) :
set of k most similar images
– Submanifold Analysis (SA) [cf. next slide]
mklab.iti.gr #15
16. Incremental learning setting (2)
• Submanifold Analysis [Jia et al., 2009]
– Construct (k+1)x(k+1) similarity matrix WS between new
item and k most images from the annotated set
– Construct sub-diagonal and sub-Laplacian matrices
– Compute eigenvalues and d
eigenvectors corresponding to non-zero
eigenvalues [computation is lightweight since k << n]
– Minimize reconstruction error:
– Reconstruct approximate eigenvectors:
mklab.iti.gr #16
17. Fusion of multiple features
Graph struct. feature fusion (F-GSF)
Feature fusion (F-FEAT)
Similarity graph fusion (F-SIM) Result fusion (F-RES)
mklab.iti.gr #17
18. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #18
19. Synthetic data - experiments
• Use of four 2D distributions with limited number of
samples (thousands) to test many settings
TWO MOONS LINES CIRCLES GAUSSIANS
• Performance aspects
– Parameters of approach: number of features (CD), graph
construction technique (kNN, εNN) and parameters (k, ε)
– Learning setting (training size, data noise, nr. of classes)
– Inductive learning (LP vs SA)
– Fusion method
mklab.iti.gr #19
20. Role of number of GSF (CD)
TWO MOONS LINES
noise
levels
CIRCLES GAUSSIANS
higher CD better mAP
higher noise higher CD
mklab.iti.gr #20
21. Role of graph construction technique
kNN εNN
kNN better and less sensitive than εΝΝ
mklab.iti.gr #21
22. Role of noise (σ)
TWO MOONS LINES
competing
CIRCLES methods GAUSSIANS
In most cases GSF equal or better than the expensive SVM-RBF.
mklab.iti.gr #22
23. Role of training samples (α%)
TWO MOONS LINES
CIRCLES GAUSSIANS
In most cases few training samples (2-5%) are sufficient for high accuracy.
mklab.iti.gr #23
24. Number of classes (K)
LINES CIRCLES
Sufficiently good accuracy wrt. number of classes
(much better than linear SVM, a bit worse than SVM-RBF).
mklab.iti.gr #24
25. Scalability wrt. number of features
Linearly increasing cost wrt.
dimensionality
Constant cost wrt.
dimensionality
mklab.iti.gr #25
26. Comparison between fusion methods
LINES CIRCLES
Even when one feature goes bad, result and GSF fusion still do
better than the best.
mklab.iti.gr #26
27. Incremental schemes SA much better and less sensitive than LP.
TWO MOONS LINES
CIRCLES GAUSSIANS
mklab.iti.gr #27
28. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #28
30. GSF vs SESPA
GSF-F1, F2, F3: Single feature GSF
GSF-C: Graph structure feature fusion
GSF-D1, D2: Result fusion using LIBLINEAR (1) and RBF (2)
mklab.iti.gr #30
31. GSF vs MKL
VISUAL
MKL better in: baby, bird, river, sea.
Possible thanks to
scalable behavior wrt.
TAG
number of features.
GSF better in: baby, bird, car, dog, river, sea.
mklab.iti.gr #31
36. Overview
• Problem formulation
• Related work
• Graph Structure Features Approach
• Evaluation
– Synthetic datasets
– MIR-Flickr
• Conclusions
mklab.iti.gr #36
37. Conclusions
• Concept detection approach based on the structure of image
similarity graphs
– Transductive learning setting
– Two variants for online learning
• Thorough experimental analysis
– Behavior under a variety of settings/parameters
– Equivalent or better behavior compared to SoA approaches
• Fast:
– SA with k=5 takes 38.4msec per image (not incl. feature extraction)
– Future work: further analysis of computational characteristics +
application to larger scale datasets (NUS-Wide, ImageNet)
mklab.iti.gr #37
39. References (1)
• Graph-based semi-supervised learning
Zhu, X.: Semi-supervised learning with graphs. PhD Thesis, Carnegie
Mellon University, 0-542-19059-1 (2005)
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J. Schoelkopf, B.: Learning
with Local and Global Consistency. Advances in NIPS 16, MIT Press
(2004) 321-328
• Related approaches
Wang, M., Hua, X.-S. Tang, J., Hong, R.: Beyond distance measurement:
constructing neighborhood similarity for video annotation. TMM 11
(3) (2009), 465-476
Tang, J. et al.: Inferring semantic concepts from community contributed
images and noisy tags. ACM Multimedia (2009) 223-232
Chen, X. et al.: Efficient large scale image annotation by probabilistic
collaborative multi-label propagation. ACM Multimedia (2010), 35-44
Tang, L., Liu, H.: Leveraging social media networks for classification. Data
Mining and Knowledge Discovery 23 (3) (2011), 447-478
mklab.iti.gr #39
40. References (2)
• Relational classification
Macskassy, S.A., Provost, F.: Classification in Networked Data: A Toolkit
and a Univariate Case Study. Journal of Machine Learning Research 8,
(2007), 935-983
• Laplacian Eigenmaps
Mikhail, B., Partha, N.: Laplacian Eigenmaps for dimensionality reduction
and data representation. Neural Computing 15 (6), MIT Press (2003)
1373-1396
Jia, P., Yin, J., Huang, X., Hu, D.: Incremental Laplacian eigenmaps by
preserving adjacent information between data points. PR Letters 30
(16) (2009), 1457–1463
mklab.iti.gr #40
41. References (3)
• Tools
Leyffer, S., Mahajan, A.: Nonlinear Constrained Optimization: Methods and
Software. Preprint ANL/MCS-P1729-0310 (2010)
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A Library for Large Linear
Classification. Journal of ML Research 9 (2008), 1871-1874
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology 2 (3) (2011), 27:1–27:27
• Dataset
Huiskes, M.J., Michael S. Lew, M.S.: The MIR Flickr Retrieval Evaluation.
Proceedings of ACM Intern. Conf. on Multimedia Information Retrieval (2008)
• Competing methods
Hare, J.S., Lewis, P.H.: Automatically annotating the MIR Flickr dataset. ACM ICMR
(2010), 547-556
Guillaumin, M., Verbeek, J., Schmid, C.: Multimodal semi supervised learning for
image classification. Proceedings of IEEE CVPR Conference (2010), 902-909
mklab.iti.gr #41