This document presents a system that aims to derive location information from visual content in videos without geo-tags. It divides the world into regions based on different criteria like temperature and biomes and uses visual similarity measures to match videos to these regions. Initial tests on 500 videos using a 22 biome classification achieved a 12.17% accuracy, better than random chance of 4.55%. Future work will focus on only using outdoor videos and excluding indoor images which provide noisy information.
Preliminary Geo-tagging of Social Video Using Visual Content
1. Preliminary Exploration of the Use of
Geographical Information for Content-
based Geo-tagging of Social Video
5-10-2012
Xinchao Li, Claudia Hauff, Martha Larson, Alan Hanjalic
Delft
University of
Technology
Challenge the future
2. System Overview
• Goal
derive location information from the visual content of videos
• Challenge
• no tags: 35.7%, only one tag: 13.1%
• improve metadata-based system
System Overview
Visual similarity measures for semantic video retrieval 2
3. •Assumption
divide the world map into regions that have a high within-
region visual stability and a high between-region variability
South Pole
Great Victoria Desert
System Overview
Visual similarity measures for semantic video retrieval 3
4. Different Division Methods
• Baseline
Visual similarity measures for semantic video Methods
Different Division retrieval 4
5. • Temperature Data based
Visual similarity measures for semantic video Methods
Different Division retrieval 5
6. • Temperature Data based
6 temperature regions: from -20◦C to 40◦C with 10◦C intervals.
Visual similarity measures for semantic video Methods
Different Division retrieval 6
7. • Biomes Data based
Visual similarity measures for semantic video Methods
Different Division retrieval 7
8. Run Results
Run Results
Visual similarity measures for semantic video retrieval 8
9. Run Results
22 Biomes classification: 12.17% (random, 4.55%)
Run Results
Visual similarity measures for semantic video retrieval 9
10. Discussion
• Visual Content of Test Videos
500 videos from the 4182 videos (12%)
• Indoor (42%)
• Outdoor Event (32%)
• Normal Outdoor (26%)
• Visual Content of Training Photos
458 photos from the 3M training set
• Indoor (27.5%)
Discussion
Visual similarity measures for semantic video retrieval 10
11. Indoor (42%)
Discussion
Visual similarity measures for semantic video retrieval 11
12. Outdoor Event (32%)
Discussion
Visual similarity measures for semantic video retrieval 12
13. Normal (26%)
Discussion
Visual similarity measures for semantic video retrieval 13
14. Conclusion and Future work
• Recall our assumption
“we can divide the world map into regions
that have a high within-region visual stability and a
high between-region variability.”
• indoor images are noisy information
• Only use outdoor videos to train and test
Discussion
Visual similarity measures for semantic video retrieval 14
15. Thank you!
X.Li-3@tudelft.nl
Visual similarity measures for semantic video retrieval 15