QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections

QMUL @ MediaEval 2012:
Social Event Detection in
Collaborative Photo Collections
Markus Brenner, Prof. Ebroul Izquierdo

Multimedia and Vision Research Group
Queen Mary University of London, UK

OBJECTIVE

In Collaborative Photo Collection …

1. Find and detect social events
2. Retrieve photos associated with the events

… with the help of additional, external information

INTRODUCTION AND
BACKGROUND

 Internet enables people to host, access and share their
photos online; for example, through websites like Flickr
and Facebook
 Collaborative annotations and tags as well as public
comments are commonplace
 Information people assign varies greatly but often seems
to include some sort of references to what happened
where and who was involved
 observed experiences or occurrences
 simply referred to as events

INTRODUCTION AND
BACKGROUND

 Easier to search through photo collections if photos are
grouped into events
 Link events in photo collections to public social media
like online news feeds
 Automatically link news with corresponding photos
 Provide additional information that might be relevant to
users to facilitate their search, like the date and location
of an event

OVERVIEW OF FRAMEWORK

Query Preprocessing

Matching
Composing Textual Extracting Visual
Geographic
Features Features
Gathering External Data Locations

General

Detecting Events Limiting Search Space
Looking up
Translating Terms Geographic Google
Google
Locations Geocoding
Translate API
API By Date/Time By Date and
By Date and Time By Location
and Topic Location
Compiling Names
Expanding the
of Geographic
WordNet Topic GeoNames
Locations

Detected Retrieving Photos
Events
DBpedia
(via SPARQL) Topic-Specific Textual Features
Retrieved
Photos
Expanding Visual Pruning
Soccer Matches* Classification
Feature Space (Classification)

* Example. Framework extendable to other topics.

GATHERING EXTERNAL DATA

 Expanding the topic
 Handling geographic locations
(e.g. compiling names of locations)

Expanding the Topic

 Social events often revolve around a topic
Examples: Festivals, sport events, …
 Problem: Users to no adhere to a controlled vocabulary
 Idea: Expand textual representation of a given topic
Example: Expand the term concert by relating terms like
festival, gig, band, sound, etc.
 Accomplish through combination of WordNet, DBpedia
and some initial evidence

Handling Geographic Locations

 Venue location of a social event is an important cue
 Interested in gaining a more complete understanding
such as of the city and country a event takes place to
expand the query
 Beneficial as users often refer to a different geographical
hierarchy, e.g. foreigner to a country but local to a city
 Also consider geographic coordinates to later match
geo-tagged photos
 Use Google Geocoding API

Compiling Names of Locations

 Identify and understand any textual annotations in
photos that refer to geographic locations
 Used in retrieval process to isolate photos that do not
likely correspond to the venue of a queried event
 Extract all countries and larger cities from the
GeoNames dataset

Topic-Specific: Soccer Matches

Use DBpedia (SPARQL) to find all soccer clubs and
associated stadiums for a given city in the query

PREPROCESSING

 Matching geographic locations
 Translating terms and stop-words
 Composing textual features

Matching Geographic Locations

 Geo-tagged photos are becoming more and more
popular
 Identify photos as belonging and not belonging to a
venue (and an event when also considering the time)
 For each venue compile two sets of photos
(within/outside its bounds)

Translating Terms and Stop-words

 Photos get annotated and tagged in many different
languages
 Translate topic-related terms and stop-words into other
languages
 Limit to languages prevailing in the countries in which
the query venues are located
 Use Google Translate API

Composing Textual Features

 Concatenate all information into a combined textual
representation (title, description, keywords, username, …)
 Also include information obtained from external sources
 Use Roman preprocessor to converts text into lower
case, strip punctuation as well as whitespaces and remove
accents from Unicode characters
 Eliminates common stop-words, numbers and terms
commonly associated with photography
 Apply language-agnostic character-based tokenizer
 Convert tokens into a matrix of occurrences (TF/IDF)

RETRIEVING PHOTOS OF AN EVENT

In the most basic case, we (already) know about a specific
event, and we wish to simply retrieve all photos associated
with it

 Classification-based approach
 Limiting search space
 Expanding feature space
 Visual pruning

Classification-based Approach: I

 Treat each event independently (we instantiate a
separate classifier for each event for a series of events)
 Train classifier on the textual features we compose
beforehand according to each event
 No separate training dataset required

Classification-based Approach: II

 Binary classification, but also introduce a third class that
reflects events of the same topic to improve results
 Possible to include features of another query
 Two different fusing strategies implemented
 Experiment with multiple classifiers (Linear SVC, SGD, …)
 Use spare data representation and sparse-adjusted classifier

Limiting Search Space

 Generally, the date and time a photo was captured are
effective cues to bound the search space
 For each event’s prediction step, we consider only those
photos that lie within the event’s temporal search
window
 Specified by the query (e.g. New Year’s Eve)
 Retrieved by the framework through external topic-specific
sources (e.g. the specific days of a concert tour)
 Roughly estimated (based on a clustering scheme) in the
forthcoming event detection method
 Exclude photos not matching geographic location

Expanding Feature Space

 Expand feature space based on query information and
photo collection itself
 Helpful when “training” information is sparse
(the case when there are few geo-tagged photos)
 Iterative two-step process:
1. Train initial classifier on the few query terms available
2. Then compile new list of textual terms based on the
predicted outcome over all applicable photos
3. Finally, used gained terms to refine initial query terms
 Example: Photos related to a specific music venue
contain terms of the playing band or artist

Visual Pruning

 Mixing textual and visual features is not straightforward
 Employ a cascade of two separate classifiers, each
separately adjusted to its feature space and data
representation
 First fast textual classification, then visual binary
pruning on few remaining photos
 Utilize MPEG-7 color and texture features
 Experiment with several classifiers (Random
Forrest, SVC with RBF kernel, Linear SVC)

DETECTING EVENTS

Two proposals:
 If the date but not time of day is known, apply a
clustering method on all candidates of a given day
 largest clusters then reflect events
 Otherwise: Expand approach by performing a prediction
step for any day instead of just selected days conforming
to the events  will inadvertently grow the search space
 In both cases apply a threshold (number of photos
relating to potential event) prior considering a new event

EXPERIMENTS

 Dataset
 Implementation details and setup
 Results

Dataset

 2012 MediaEval SED Dataset – Challenge II
 167.332 photos collected from Flickr
 Metadata: unique Flickr ID, capture
timestamp, username, title, description, keywords and
partial geographic coordinates (in about a fifth of the
cases:)
 Ground truth in the form of event clusters (specifying
associated photos) for two topics/challenges
 “Training set”: 2011 MediaEval SED Dataset

Implementation Details and
Setup

 Define event as a distinct combination of location and
date (one event per day at the same location)
 Use English names of locations only
 Bounding threshold of 500 meter
 Default: Linear SVC, no feature expansion, no visual
pruning
 Evaluation measures: Precision (P), Recall (R),
F-score, Normalized Mutual Information (NMI)

Dataset Setup

 Focus on Challenge II
 Challenge I/III: Current approach has limitation
 No event/venue detection through social media websites
like Twitter
 Only basic venue/location detection/clustering
 issue when the destination covers a large area
(e.g. entire country)

Results: Challenge II

 Detected: 32 events

 Identified several thousand photos not belonging
to any relevant venue
 substantial reduction of candidates
 large amount of training samples

P R F NMI
Default configuration 79.0 67.1 72.6 0.65
Basic event detection 56.0 69.6 62.0 0.53  worse
With visual pruning 83.2 61.9 71.0 0.63
With feature expansion 79.0 66.9 72.5 0.65

CONCLUSION

 External information, e.g. about a venue, helpful for
both event detection and retrieval of associated photos
 Finding and linking external data in a uniform way
still challenging
 Visual information does not improve results much
 Future considerations:
 Social media websites like Facebook and Twitter
 Improved venue/location detection/clustering

QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (14)

Semelhante a QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections

Semelhante a QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections (19)

Mais de MediaEval2012

Mais de MediaEval2012 (20)

Último

Último (20)

QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections