Our presentation at the High Risk / High Reward track at the ACM MM 2014 conference. In this presentation we present a novel way to tackle large scale image classification or retrieval.
How ‘How’ Reflects What’s What: Content-based Exploitation of How Users Frame Social Images
1. How ‘How’ Reflects
What’s What:
Content-based Exploitation of
How Users Frame Social
Images
Michael Riegler, Simula Research Laboratory, Norway!
Martha Larson, Delft University of Technology, Netherlands!
Mathias Lux, University of Klagenfurt, Austria!
Christoph Kofler, Delft University of Technology, Netherlands
2. We will introduce a signal that
!
exists in every image collection
&
gives you an enormous speedup!
3. Take Home Message
❖ Photographers use intentional frames.!
❖ The frames reflect the semantic categories of images.!
❖ In turn, global image features reflect the frames.!
❖ This motivates a fast and simple approach to image
semantics.!
❖ Take home a strong inner feeling that you want to try it
out yourself!
5. ❖ You may think now that you
already know it, its called:!
❖ Concepts or…!
❖ Scenes!
❖ But Wrong!
6. ❖ And let me tell you, it is also not!
❖ Composition!
❖ Also Wrong!
7. “Intentional framing is the sum of
choices made by photographers on
exactly how to portray the subject
matter that they have decided to
photograph.”
–The Definition
Picture source: https://www.flickr.com/photos/ausnap/5712791522/in/photostream/
8. Mechanics of Intentional Framing
semantic
reflects reflects
category of an
image
the
photographers´
intent
global image
features
reflects
13. Hypothesis
❖ Photographers’ choices.!
❖ Even if framing is not a conscious decision, it still is an
unconscious one.!
❖ Similar intents for taking images lead to similar
framings.!
❖ Global features can capture these intentional semantics.
15. Global Features and Intent
❖ Global features connect semantics and intent.!
❖ Show that there exist a solid evidence for intentional
framing.!
❖ Clustering experiment on two different data sets!
❖ Intent data set!
❖ Fashion 10000 data set
16. Correlation of Peoples’ Perception and Global Features
❖ X-means clustering!
❖ Based on different global
features.!
❖ Features can catch different
aspects (edges, colour, etc.).!
❖ The density of the global
features based clusters
correlated to the users
perception about the
intentional framing in it.
Original
Edge
Color
17. Evidence of Human
Perception of Intent
black - a positive correlation!
red - a negative correlation
Intent Categories
!
Global Features
1 2 3 4 5 6
CEDD
FCTH
Gabor
Tamura
Luminance Layout
Scalable Color
Opponent Histogram
Autocolor Correlogram
JPEG Coefficent
Edge Histogram
PHOG
JCD
Joint Histogram
20. Content Based Classification
❖ Using intentional framing to tackle a classification
problem.!
❖ Simple search-based classifier (SimSea).!
❖ Our submission to the ACM MM `13 Yahoo! - Large-scale
Flickr-tag image Classification Grand Challenge!
❖ Reviewers told us: It is too simple…
21. Remember the challenge?
❖ 2 million images.!
❖ 10 different semantic
categories.!
❖ nature, people, music,
london, 2012, food, wedding,
sky, beach, travel.!
❖ extremely diverse categories.
22. JCD CL OH PHOG
2012 0,198 0,128 0,130 0,104
beach 0,448 0,487 0,342 0,534
food 0,531 0,492 0,389 0,352
london 0,244 0,201 0,146 0,347
music 0,526 0,457 0,495 0,164
nature 0,502 0,410 0,435 0,503
people 0,264 0,227 0,244 0,105
sky 0,628 0,601 0,544 0,473
travel 0,139 0,101 0,128 0,112
wedding 0,463 0,272 0,262 0,235
The results iAP per category based on the
development set
23. Compared to the Official Results
!
!
Our method!
SimSea
Local 1
(SMaL[1])
Local 2
(SVM[1])
❖ Very good results with a very simple method.!
❖ Very time efficient.!
❖ Processed on a single desktop PC.
Concept 1
(HA[2])
MiAP 0,391 0,422 0,413 0,37
[1] E. Mantziou, S. Papadopoulos, and Y. Kompatsiaris. Scalable Training with Approximate Incremental Laplacian Eigenmaps and
PCA. In Proceedings of the ACM MM 13’, pages 381–384, 2013.
[2] W. Hsu. Flickr-tag Prediction Using Multi-modal Fusion and Meta Information. In Proceedings of ACM MM 13’, pages 353–
356, 2013.
24. Conclusion
❖ Intentional framing exists.!
❖ Different framing correspond to different global
features.!
❖ Interesting framework for leveraging global features
classification.!
❖ Fast and simple!!
❖ New vista for multimedia research.