TeamStation AI System Report LATAM IT Salaries 2024
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
1. QMUL @ MediaEval 2012:
Social Event Detection in
Collaborative Photo Collections
Markus Brenner, Prof. Ebroul Izquierdo
Multimedia and Vision Research Group
Queen Mary University of London, UK
2. OBJECTIVE
In Collaborative Photo Collection …
1. Find and detect social events
2. Retrieve photos associated with the events
… with the help of additional, external information
3. INTRODUCTION AND
BACKGROUND
Internet enables people to host, access and share their
photos online; for example, through websites like Flickr
and Facebook
Collaborative annotations and tags as well as public
comments are commonplace
Information people assign varies greatly but often seems
to include some sort of references to what happened
where and who was involved
observed experiences or occurrences
simply referred to as events
4. INTRODUCTION AND
BACKGROUND
Easier to search through photo collections if photos are
grouped into events
Link events in photo collections to public social media
like online news feeds
Automatically link news with corresponding photos
Provide additional information that might be relevant to
users to facilitate their search, like the date and location
of an event
5. OVERVIEW OF FRAMEWORK
Query Preprocessing
Matching
Composing Textual Extracting Visual
Geographic
Features Features
Gathering External Data Locations
General
Detecting Events Limiting Search Space
Looking up
Translating Terms Geographic Google
Google
Locations Geocoding
Translate API
API By Date/Time By Date and
By Date and Time By Location
and Topic Location
Compiling Names
Expanding the
of Geographic
WordNet Topic GeoNames
Locations
Detected Retrieving Photos
Events
DBpedia
(via SPARQL) Topic-Specific Textual Features
Retrieved
Photos
Expanding Visual Pruning
Soccer Matches* Classification
Feature Space (Classification)
* Example. Framework extendable to other topics.
6. GATHERING EXTERNAL DATA
Expanding the topic
Handling geographic locations
(e.g. compiling names of locations)
7. Expanding the Topic
Social events often revolve around a topic
Examples: Festivals, sport events, …
Problem: Users to no adhere to a controlled vocabulary
Idea: Expand textual representation of a given topic
Example: Expand the term concert by relating terms like
festival, gig, band, sound, etc.
Accomplish through combination of WordNet, DBpedia
and some initial evidence
8. Handling Geographic Locations
Venue location of a social event is an important cue
Interested in gaining a more complete understanding
such as of the city and country a event takes place to
expand the query
Beneficial as users often refer to a different geographical
hierarchy, e.g. foreigner to a country but local to a city
Also consider geographic coordinates to later match
geo-tagged photos
Use Google Geocoding API
9. Compiling Names of Locations
Identify and understand any textual annotations in
photos that refer to geographic locations
Used in retrieval process to isolate photos that do not
likely correspond to the venue of a queried event
Extract all countries and larger cities from the
GeoNames dataset
12. Matching Geographic Locations
Geo-tagged photos are becoming more and more
popular
Identify photos as belonging and not belonging to a
venue (and an event when also considering the time)
For each venue compile two sets of photos
(within/outside its bounds)
13. Translating Terms and Stop-words
Photos get annotated and tagged in many different
languages
Translate topic-related terms and stop-words into other
languages
Limit to languages prevailing in the countries in which
the query venues are located
Use Google Translate API
14. Composing Textual Features
Concatenate all information into a combined textual
representation (title, description, keywords, username, …)
Also include information obtained from external sources
Use Roman preprocessor to converts text into lower
case, strip punctuation as well as whitespaces and remove
accents from Unicode characters
Eliminates common stop-words, numbers and terms
commonly associated with photography
Apply language-agnostic character-based tokenizer
Convert tokens into a matrix of occurrences (TF/IDF)
15. RETRIEVING PHOTOS OF AN EVENT
In the most basic case, we (already) know about a specific
event, and we wish to simply retrieve all photos associated
with it
Classification-based approach
Limiting search space
Expanding feature space
Visual pruning
16. Classification-based Approach: I
Treat each event independently (we instantiate a
separate classifier for each event for a series of events)
Train classifier on the textual features we compose
beforehand according to each event
No separate training dataset required
17. Classification-based Approach: II
Binary classification, but also introduce a third class that
reflects events of the same topic to improve results
Possible to include features of another query
Two different fusing strategies implemented
Experiment with multiple classifiers (Linear SVC, SGD, …)
Use spare data representation and sparse-adjusted classifier
18. Limiting Search Space
Generally, the date and time a photo was captured are
effective cues to bound the search space
For each event’s prediction step, we consider only those
photos that lie within the event’s temporal search
window
Specified by the query (e.g. New Year’s Eve)
Retrieved by the framework through external topic-specific
sources (e.g. the specific days of a concert tour)
Roughly estimated (based on a clustering scheme) in the
forthcoming event detection method
Exclude photos not matching geographic location
19. Expanding Feature Space
Expand feature space based on query information and
photo collection itself
Helpful when “training” information is sparse
(the case when there are few geo-tagged photos)
Iterative two-step process:
1. Train initial classifier on the few query terms available
2. Then compile new list of textual terms based on the
predicted outcome over all applicable photos
3. Finally, used gained terms to refine initial query terms
Example: Photos related to a specific music venue
contain terms of the playing band or artist
20. Visual Pruning
Mixing textual and visual features is not straightforward
Employ a cascade of two separate classifiers, each
separately adjusted to its feature space and data
representation
First fast textual classification, then visual binary
pruning on few remaining photos
Utilize MPEG-7 color and texture features
Experiment with several classifiers (Random
Forrest, SVC with RBF kernel, Linear SVC)
21. DETECTING EVENTS
Two proposals:
If the date but not time of day is known, apply a
clustering method on all candidates of a given day
largest clusters then reflect events
Otherwise: Expand approach by performing a prediction
step for any day instead of just selected days conforming
to the events will inadvertently grow the search space
In both cases apply a threshold (number of photos
relating to potential event) prior considering a new event
23. Dataset
2012 MediaEval SED Dataset – Challenge II
167.332 photos collected from Flickr
Metadata: unique Flickr ID, capture
timestamp, username, title, description, keywords and
partial geographic coordinates (in about a fifth of the
cases:)
Ground truth in the form of event clusters (specifying
associated photos) for two topics/challenges
“Training set”: 2011 MediaEval SED Dataset
24. Implementation Details and
Setup
Define event as a distinct combination of location and
date (one event per day at the same location)
Use English names of locations only
Bounding threshold of 500 meter
Default: Linear SVC, no feature expansion, no visual
pruning
Evaluation measures: Precision (P), Recall (R),
F-score, Normalized Mutual Information (NMI)
25. Dataset Setup
Focus on Challenge II
Challenge I/III: Current approach has limitation
No event/venue detection through social media websites
like Twitter
Only basic venue/location detection/clustering
issue when the destination covers a large area
(e.g. entire country)
26. Results: Challenge II
Detected: 32 events
Identified several thousand photos not belonging
to any relevant venue
substantial reduction of candidates
large amount of training samples
P R F NMI
Default configuration 79.0 67.1 72.6 0.65
Basic event detection 56.0 69.6 62.0 0.53 worse
With visual pruning 83.2 61.9 71.0 0.63
With feature expansion 79.0 66.9 72.5 0.65
27. CONCLUSION
External information, e.g. about a venue, helpful for
both event detection and retrieval of associated photos
Finding and linking external data in a uniform way
still challenging
Visual information does not improve results much
Future considerations:
Social media websites like Facebook and Twitter
Improved venue/location detection/clustering