O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Fast object re detection and localization in video for spatio-temporal fragment creation

846 visualizações

Publicada em

Fast object re detection and localization in video for spatio-temporal fragment creation

Publicada em: Tecnologia, Educação
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Fast object re detection and localization in video for spatio-temporal fragment creation

  1. 1. 1 Information Technologies Institute Centre for Research and Technology Hellas Fast object re-detection and localization in video for spatio-temporal fragment creation Evlampios Apostolidis, Vasileios Mezaris, Ioannis Kompatsiaris Information Technologies Institute / Centre for Research and Technology Hellas ICME MMIX 2013, San Jose, CA, USA, July 2013
  2. 2. 2 Information Technologies Institute Centre for Research and Technology Hellas Overview • Introduction - problem formulation • Related work • Baseline approach • Proposed approach – GPU-based processing – Video-structure-based sampling of video frames – Robustness to scale variations • Experiments and results • Conclusions
  3. 3. 3 Information Technologies Institute Centre for Research and Technology Hellas Introduction – problem formulation • Object re-detection: a particular case of image matching • Main goal: find instances of a specific object within a single video or a collection of videos – Input: object of interest + video file – Processing: similarity estimation by means of image matching – Output: detected instances of the object of interest
  4. 4. 4 Information Technologies Institute Centre for Research and Technology Hellas Introduction – problem formulation Extension for interactive and linked TV • Semi-automatic identification and annotation of object-specific spatio- temporal media fragments – Annotate the object of interest – Run the object re-detection algorithm – Get automatically instance-based annotated video fragments – Find related content fragments and establish links between them Assign a label to the object of interest Instance-based annotated video fragment Links to related content
  5. 5. 5 Information Technologies Institute Centre for Research and Technology Hellas Related work • Extraction and matching of scale- and rotation-invariant local descriptors is one of the most popular SoA approaches for similarity estimation between pairs of images – Local feature extraction • Edge detectors (e.g. Canny), corner detectors (e.g. Harris-Laplace) – Local feature description • SIFT or extensions of it, SURF, BRISK, binary descriptors such as BRIEF, … – Matching of local descriptors • k-Nearest Neighbor search between descriptor pairs using brute-force or hashing – Filtering of erroneous matches • Symmetry test between the pairs of matched descriptors • Ratio test regarding the distances of the calculated nearest neighbors • Geometric verification between the pair of images using RANSAC – Extensions • Combined use of keypoints and motion information (tracking) • Bag-of-Words (BoW) matching for pruning
  6. 6. 6 Information Technologies Institute Centre for Research and Technology Hellas Proposed approach • Starting from a baseline approach, – Improve detection accuracy – Reduce the needed processing time • Work directions: – GPU-based processing – Video-structure-based sampling of frames – Enhancing robustness to scale variations
  7. 7. 7 Information Technologies Institute Centre for Research and Technology Hellas GPU-based processing Accelerated parts of the overall pipeline: • Video decompression into frames • Keypoint detection and description • Brute-Force matching and 2-NN search • Drawing of the calculated bounding boxes (optional)
  8. 8. 8 Information Technologies Institute Centre for Research and Technology Hellas Video-structure-based sampling • Sequential processing of video frames is replaced by a structure-based one, using the analysis results of a shot segmentation method Example Check shot 1 No detection! Move to the next shot Check shot 2 Detection! Check all shot-2 frames Detect and highlight the object of interest
  9. 9. 9 Information Technologies Institute Centre for Research and Technology Hellas Robustness to scale variations Problem • Major changes in scale may lead to detection failure due to the significant limitation of the area that is used for matching • Zoom-in case: the middle image (b) corresponds to a small upper right area of the object O in the left one (a) • Zoom-out case: in the right image (c) the object O occupies a very small part of the frame • Both cases lead to a considerable reduction of the number of matched pairs of descriptors, and thus often to detection failure a b c
  10. 10. 10 Information Technologies Institute Centre for Research and Technology Hellas Robustness to scale variations Solution • we automatically generate a zoomed-out and a centralized zoomed-in instance of the object O and we utilize them in the matching procedure Zoomed-in instance – selection of a center-aligned sub- area of the original object O and enlargement to the actual size of O using bilinear interpolation – choice: 70% of the original image area  140% zoom-in factor Zoomed-out instance – shrink the original image O into a smaller one using nearest neighbor interpolation – the maximum zoom-out factor is determined by the restrictions of the GPU-based implementation of SURF Original image Zoomed-in instance Zoomed-out instance
  11. 11. 11 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results • System specifications – Intel Core i7 processor at 3.4GHz – 8GB RAM memory – CUDA-enabled NVIDIA GeForce GTX560 GPU • Dataset – 6 videos* of 273 minutes total duration – 30 manually selected objects • Ground-truth (generated via manual annotation) – 75.632 frames contain at least one of these objects – 333.455 frames do not include any of the selected objects * The videos are episodes from the “Antiques Roadshow” of the Dutch public broadcaster AVRO (http://avro.nl/) Examples of sought objects
  12. 12. 12 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results • Aim: quantify the improvement that each extension of the baseline approach is responsible for • Four experimental configurations: – C1: baseline implementation – C2: GPU-accelerated implementation, – C3: GPU-accelerated and video-structure-based sampling implementation – C4: complete proposed approach which includes: GPU-processing video-structure-based sampling and robustness to scale variations
  13. 13. 13 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results • Detection accuracy is expressed in terms of Precision, Recall and F-Score • Evaluation was performed in a per-frame basis, i.e. considering the 30 selected objects and counting the number of frames where these were correctly detected, missed, etc. • Time efficiency was evaluated by expressing the processing time of each configuration as a factor of the actual duration of the processed videos • Robustness to scale variations was quantified using two specific sets of frames where the object of interest was observed from: – a very close viewing position (2.940 frames) and – a very distant viewing position (4.648 frames)
  14. 14. 14 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results Precision Recall F-Score C1 0.999 0.856 0.922 C2 0.999 0.856 0.922 C3 1.000 0.852 0.920 C4 1.000 0.992 0.996 Precision Recall F-Score Processing Time (x Real-Time) C1 0.999 0.868 0.929 2.98-5.26 C2 0.999 0.850 0.918 0.35-1.24 C3 0.999 0.849 0.918 0.03-0.13 C4 0.999 0.872 0.931 0.03-0.19 Evaluation results for configurations C1 to C4 Precision Recall F-Score C1 0.999 0.831 0.907 C2 0.999 0.831 0.907 C3 1.000 0.799 0.888 C4 1.000 0.914 0.955 Evaluation results for highly zoomed-out instances Evaluation results for highly zoomed-in instances
  15. 15. 15 Information Technologies Institute Centre for Research and Technology Hellas Experiments and Results Detection accuracy • All versions exhibited very good results in terms of detection accuracy • Version C4 (complete proposed approach) achieved the best results • The algorithm performed considerably well for a range of different scales and orientations and for partial visibility or partial occlusion Processing time • The video-structure-based sampling strategy led to a great reduction of the required processing time • The algorithm needs about 10% of the video’s duration, preserving the same high levels of detection accuracy with the slower configurations Online demo available at: http://www.youtube.com/watch?v=0IeVkXRTYu8
  16. 16. 16 Information Technologies Institute Centre for Research and Technology Hellas Extensions, ideas and plans • Recent extension: Multiple instances of an object of interest can be used as input for more efficient re-detection of 3D objects • Future ideas: test the algorithm’s performance as a tool for chapter segmentation in videos where the chapters are temporally demarcated by the presence of a specific object (e.g. a painting in a video about art) • Future plans: evaluate the extended algorithm’s performance (detection accuracy and time efficiency) in a new set of videos Input Output
  17. 17. 17 Information Technologies Institute Centre for Research and Technology Hellas Conclusions • The proposed method can be used for fast and accurate re-detection of pre-defined objects in videos • The time performance of the implemented algorithm allows for real-time processing of multi-media content • Extended by a prior object labeling step, this technique can be seen as: – A reliable tool for instance-based annotated, spatio-temporal fragments in videos – A key-enabled technology for finding similar content and establishing links between related media fragments, thus contributing to the realization of interactive and linked TV
  18. 18. 18 Information Technologies Institute Centre for Research and Technology Hellas Questions? More information: http://www.iti.gr/~bmezaris bmezaris@iti.gr

×