Presentación de la Universidad Autónoma de Madrid sobre tratamiento e interpretación de vídeo en temas relacionados con la video-vigilancia, realizada durante las jornadas HOIP 2010 organizadas por la Unidad de Sistemas de Información e Interacción TECNALIA.
Más información en http://www.tecnalia.com/es/ict-european-software-institute/index.htm
Developer Data Modeling Mistakes: From Postgres to NoSQL
Hoip10 presentacion video-vigilancia_uam
1. Video Processing and Understanding in
Surveillance Applications
…segmentation, multimodal backgrounds, stationary foreground, tracking,
tracking,
people detection, shadow detection, unattended and stolen objects, human
objects,
actions detection, video browsing, evaluation, ToF cameras, …
José M. Martínez
JoseM.Martinez@uam.es
Hands-on Image Processing 2010 (HOIP’10)
16-17 November 2010
Escuela Politécnica Superior Universidad Autónoma de Madrid Video Processing and Understanding Lab
E28049 Madrid (SPAIN) Grupo de Tratamiento e Interpretación de Vídeo
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 2
2. Introduction
Video Processing and Understanding Lab
http://www-vpu.eps.uam.es
Research group focused on digital image processing theory, methods and
applications aimed for video sequence analysis and visual content
adaptation.
The main fields of application are video-surveillance systems and video
repositories (video sequences indexing and retrieval).
The activity of the group is mainly oriented to the real-time and on-line
processing of video sequences, and constraints associated to such
operation modality are applied to all the lines of research of the group.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 3
Introduction
Video Surveillance and Monitoring @VPULab
Low level
Segmentation
Tracking
Mid level
People detection
Shadow detection
High level
Unattended and stolen object detection
Human action detection
Video browsing
Evaluation
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 4
3. Credits
The works presented in these slides are part of
the research of several members of VPULab
Eng. Álvaro Bayona Dr. Jesús Bescós Eng. Marcos Escudero
Eng. Víctor Fernández-Carbajales Dr. Miguel Ángel García
Eng. Álvaro García Dr. José M. Martínez Eng. Javier Molina
Eng. José Antonio Pajuelo Eng. Juan Carlos San Miguel
Eng. Fabricio Tiburzi Dr. Víctor Valdés
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 5
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 6
4. Segmentation:
Introduction
Different approaches
In video surveillance usually motion based segmentation with static cameras
“Classical” Background subtraction algorithms
Gamma-based background subtraction
• Optimized version of A. Cavallaro, O. Steiger, T. Ebrahimi, “Semantic Video Analysis for
Adaptive Content Delivery and Automatic Description”, IEEE Trans. On Circuits and Systems
for Video Technology, 15(10): 1200-1209, October 2005.
Algorithms for moving cameras
We will present two approaches:
Region-based foreground segmentation
Stationary foreground detection
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 7
Segmentation:
Introduction
Segmentation aims to
A video description closer to human perception.
A decrease of ‘semantic’ noise (multi-modal backgrounds, illumination
artefacts) and signal noise (impulsive noise).
Y
Y
Y
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 8
5. Segmentation:
Region-based foreground segmentation
Background/foreground segmentation is usually performed at pixel level
(i.e. Statistical Background Modelling)
Region based analysis, understanding regions as groups of pixels
sharing similar attributes, help to provide:
Tools
A Robust-to-illumination region segmentation
Reflectance oriented Mean-Shift segmentation
Reflectance-homogeneous regions are fused based on RGB colour angle
An Eigenvalue based framework for region characterization and matching
Covariance of extracted features is computed for each region
Matching is performed by modelling the cost of updating a region
A Multi-layer region-based background model
Aims to model the different variations that each background region can
undergo
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 9
Segmentation:
Region-based foreground segmentation
Original Frame
Region Segmentation
Shadows Ground-Truth
Marcos Escudero, Jesús Bescós, “Region-based video object segmentation robust to illumination”, Proc. of WIAMIS’10.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 10
6. Segmentation:
Region-based foreground segmentation
S ( A, B ) = A ∩ B A ∪ B
Original Frame Mean-Shift GT SoA [2] Initial Proposed SoA [2] Initial Proposed
MR
foe: 0.911 0.300 0.899
1816
WS
foe: 0.851 0.156 0.822
624
AP
0.508 0.493 0.494
foe:
3264
[2] L. Li, et al. “Statistical modelling of complex backgrounds for foreground object detection,” IEEE Transactions on Image Processing, 13 (11), 2004.
Masks are tight to
real objects
without post processing
Marcos Escudero, Jesús Bescós, “A robust framework for region-based video object segmentation”, Proc. of ICÎP’10.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 11
Segmentation:
Region-based foreground segmentation
Hot starts Shadows
More Accurate Segmentation
Marcos Escudero, Jesús Bescós, “A robust framework for region-based video object segmentation”, Proc. of ICÎP’10.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 12
7. Segmentation:
Stationary foreground detection
Detection of stationary foreground objects (e.g., abandoned objects in crowed
places, like airports, underground stations and mass events).
We implemented and evaluated the most relevant approaches from the state of the
art.
Experimental results showed that the sub-sampling approaches obtained better
results.
Alvaro Bayona, Juan C. SanMiguel, Jose M. Martinez: "Comparative evaluation of stationary foreground object detection algorithms based on background subtraction techniques", Proc. of AVSS’09
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 13
Segmentation:
Stationary foreground detection
Sub-sampling approaches introduced several false positives in crowed sequences. To
reduce it, we have introduced some modifications based on:
1.Change background subtraction technique
2.Removing false positive on crossing zones
3.Tolerance to occlusions
The proposed algorithm for stationary foreground object detection is based on the sub-
sampling scheme, a frame difference scheme and an occlusion handling model.
Alvaro Bayona, Juan C. SanMiguel, Jose M. Martinez: "Stationary foreground detection using background subtraction and temporal difference in video surveillance", Proc. of ICIP’10
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 14
8. Segmentation:
Stationary foreground detection
We evaluated the proposed algorithm
and compare results with the base
algorithm using sequences from PETS
2006, PETS 2007 and ILIDS for AVSS
2007 datasets.
Experimental results showed that the
proposed algorithm increases the
detection of stationary foreground
regions as compared to the base
algorithm in terms of precision and
recall.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 15
Segmentation:
Stationary foreground detection
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 16
9. Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 17
Tracking
Main steps:
Detection of objects (blobs).
Gamma-based background subtraction
A BC
Characterization of objects. 35% 55%
95%
Intra-blobs: Visual attention-driven selection
A B C
Colour (luminance)
85% 75% 85%
Identification/Assignment of objects. AB C
95%
Probabilistic graph 95%
Tested in controlled and not crowed AB C
environments 65% 35% 89%
A B C
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 18
10. Tracking
Input Video Segmentation Object Detection/Extraction
(using Background Subtraction)
Frame Frame
Anterior Actual
Visual Attention Object Characterization Associations and Tracking
(intra-blobs selection)
(intra-
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 19
Tracking
Input Video Segmentation Object Detection/Extraction
(using Background Subtraction)
Visual Attention Object Characterization Tracking
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 20
11. Tracking
Other Examples
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 21
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 22
12. People detection
Automatic people detection is actually a complex problem with
multiple applications, not only in video surveillance, but also
different areas like intelligent systems (robotic), video games, etc.
People
No
People
People Variability
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 23
People detection
Fusion algorithm
Background segmentation
Fusion 3 simple independent people detectors:
• Aspect ratio
• Ellipse fitting [2]
• Ghost algorithm [3]
[2] F. Xu and K. Fujimura. Human detection using depth and gray images. Proc. of AVSS 2003.
[3] I. Haritaoglu, D. Harwood, and L. S. Davis. Ghost: a human body part labeling system using silhouettes. Proc. of ICPR 1998.
Víctor Fernández-Carbajales, MigueláAngel García, and José M. Martínez. “Robust people detection by fusion of evidence from multiple methods”, Proc. of WIAMIS’08
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 24
13. People detection
Edge algorithm People
Model
Real time adaptation [5].
[5] B. Wu and R. Nevatia. Detection of multiple, partially Background/
People/No People
occluded humans in a single image by bayesian combination Foreground Object Extraction Object Tracking
Classification
Extraction
of edgelet part detectors. In Proc. of ICCV 2005.
Decision
Four edge models of body
parts (body, head, torso and
legs).
Alvaro Garcia-Martin, Jose M. Martinez: "Robust Real Time Moving People Detection in Surveillance Scenarios", Proc. of AVSS’10
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 25
People detection
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 26
14. People detection
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 27
People detection
Results vs. Complexity
Low Medium
High
Computacional Cost
[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. of CVPR 2005.
[7] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In Proc. of CVPR 2009.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 28
15. Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 29
Shadow detection
Shadow detection process usually involves a number of classifiers which are
trained with labelled data (training phase)
The availability and creation of training data is a critical issue:
Difficulty of manual annotation (determining the accuracy of the learned models)
Amount of data used (the classifier will be very specific if it is huge or it won’t be
optimal if it is small)
Avoid the use of training data in classification tasks
Tattersall, S. and Dawson-Howe, K., “Adaptive Shadow Identification through
Automatic Parameter Estimation in Video Sequences,” Proc. of MVIP, pp. 57-
64, 2003.
Conaire, C.; O'Connor, N.; Cooke E; Smeaton, A., “Detection Thresholding
Using Mutual Information”, Proc of VISAPP., pp 408-415, 2006
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 30
16. Shadow detection
On-line learning of optimum parameters without training data
Cooperative on-line training of independent detectors to obtain the optimum
configuration (e.g., thresholds) by maximizing the agreement between
independent detectors
C. Conaire, N. O’Connor, A. Smeaton, “Detector adaptation by maximisng agreement between independent detectors, Proc. of CVPR’07.
Improvement of standard HSV shadow detection, base algorithm and its
adaptation to analysis of video sequences
Key aspects
Analysis of brightness and saturation decrease (HSV colour space)
Analysis of surfaces with similar brightness decrease
Signal correlation as agreement measure
Search of optimum configuration: Gradient ascent algorithm with coarse and fine
stages
Two options: accuracy in shadow or object detection
Juan Carlos SanMiguel, José M. Martínez “Shadow Detection in video surveillance by maximizing the agreement between independent detectors”, Proc. of ICIP’09
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 31
Shadow detection
Experimental results (PETS 2006 dataset)
DCU
[Conaire et al, CVPR2007]
DCU Ad.
Adaptation of DCU
VPU2
(shadow accurate)
VPU3
(object accurate)
32 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 32
17. Shadow detection
Experimental results (Intelligent room sequence)
DCU
[Conaire et al, CVPR2007]
DCU Ad.
Adaptation of DCU
VPU2
(shadow accurate)
VPU3
(object accurate)
33 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 33
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 34
18. Unattended and stolen object detection
Due to recent events, there is a great interest in detecting dangerous or strange
situations specially in public areas as airports, stations, subways, entrance to
buildings and mass events
• Vehicle accidents
• Intrusion in restricted areas (cars, people,...)
• Detection or tracking suspicious objects
Subway/Railway/Airport Museums
Stolen
Unattended
Object
Object
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 35
Unattended and stolen object detection
Static and non-people objects
System overview Shape
Adjustment
Shape adjustment (snakes) Shape Colour
similarity similarity
Unattended/Stolen object detectors
Gradient-based detectors Low-Gradient High-Gradient
Colour
Colour-based detectors Histogram
detector detector
detector
Combination
Gaussian model trained for Evidences
Unattended and stolen classes
Combination as an average Combination
Heaviside step function applied
for filtering out unreliable detectors Unattended or Stolen
Object
Real-time and robust detection of unattended and stolen objects
Low computational complexity
Limited application to crowded scenarios (due to previous tracking analysis)
Juan C. SanMiguel, José M. Martínez, “Robust unattended and stolen object detection by fusing simple algorithms”, Proc. of AVSS’08
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 36
19. Unattended and stolen object detection
Static and non-
people objects
Shape
Shape Adjustment
colour
similarity similarity
Gradient similarity detectors Low- High- Colour
Gradient Gradient Histogram
• 1st and 2nd detectors are based on the shape similarity Combination
» Between the object shape previously adjusted and the real Unattended or Stolen Object
shape in the current image (removing redundant shape information)
• Gradient information used to shape extraction from current image
Region of interest Object Mask Shape
Mask
Analysis Shape analysis
(Active Contours) CHECK MATCHING
Image
Analysis
Candidate object
Background Current image Thresholded Diff.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 37
Unattended and stolen object detection
Static and non-
people objects
Colour similarity detector
Shape
Shape Adjustment
colour
similarity similarity
Low- High- Colour
Background image H1 Gradient Gradient Histogram
Combination
Hue
Histogram R1 in background image Unattended or Stolen Object
(16 bins)
H2
Battacharya
distance
Current image
R2 in current image dB(H1,H2)
H3 dB(H1,H3)
R2 in background image
MCH= dB(H1,H3) - dB(H1,H2)
If MCH < 0 Unattended object
If MCH > 0 Stolen object
ECH {U , S} = EµCH {U ,S } ,σ CH {U ,S } ( M CH )
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 38
20. Unattended and stolen object detection
39 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 39
Unattended and stolen object detection
40 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 40
21. Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 41
Event detection
System overview Annotations Video Input
2D real-time analysis Foreground
Foreground/background segmentation
Segmentation
Blob tracking
Person-Object classification
Blob Tracking
Event detection (Human interactions)
Domain
Use of contextual information: Ontology Person-Object
Ontology with object models Classification
Data: Online generated (events) + User generated (annotations)
Feature
Real-time analysis (↓ Resolution, ↓ Computational Complexity) Extraction
Event modelling
High FrameRate (> 10fps) Contextual Event
Modelling constraints: Info. Module Detection
• HandUp: height of the hand higher than head
• Get/Leave Object: contextual object needed. Events
No intra-blob analysis (1 blob 1 person/object)
Juan C. SanMiguel, Marcos Escudero, Jose M. Martinez and Jesus Bescos, “Real-time event detection in smart rooms", submitted (2010)
42 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 42
22. Event detection
Input data (Blobs, their properties and contextual objects)
Modeled events:
Human-object Inter- Human activity Status
action (Leave/Get/Use) (Walking, HandUp) (Presence, Counter)
- Constraints (C) over -Temporal evolution of - Finite State Machine
blob properties and spatial attributes of the - Temporal average to
contextual information blob: mass center and increase reliability
- Bayesian combination skin areas
F<α
GetObject Skin Areas F>β F<α
•C1: Blob appears now
•C2: Blob belongs to background No
Presence Presence
•C3: Blob classified as object
•C4: There is an associated cont. object F>β
•C5: A person is doing the action
Legs Mass
•C6: Distance person-object less than th F person exists in the last
center
N frames
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 43
Event detection
Experimental results
Courtesy of project CENIT-VISION
44 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 44
23. Event detection
Experimental results
Courtesy of project CENIT-VISION
45 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 45
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 46
24. Video Browsing
Browsing of large repositories is a complex and time
(resources) consuming task
Real-time and on-line summarization allowing
Real-time and on-line summarization and browsing during capture (e.g.,
multicamera systems)
Interactive browsing based on event detection and annotations
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 47
Video Browsing
Real-time video summarization algorithm aimed to carry on-line analysis of the video
content (e.g., while being recorded) and to progressively generate the video summary. In
opposition to existing techniques, the algorithm does not require the complete original
content for the generation of the results.
The real-time video summarization algorithm is based on the dynamic creation of a
‘summarization tree’
Exclusion Node Empty Node
Starting Node
? Video Fragment Inclusion Node
A
B
C
D
E
E D D C C C C B B B B B B B B A A A A A A A A A A A A A A A A
E E D D E D D C C C C E D D C C C C B B B B B B B B
E E E D D E E D D E D D C C C C
E E E E D D
E
Resulting Summaries
Víctor Valdes, José M. Martínez, “Binary Tree Based On-Line Video Summarization”, Proc. of ACM Multimedia 2008
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 48
25. Video Browsing
RISPlayer Application: Interactive and personalized video
summaries creation and visualization.
Video Browsing Area
Summary Generation Controls
Víctor Valdés, José M. Martínez, “Introducing RISPlayer: Real-time Interactive Generation of Personalized Video Summaries”, Proc. of ACM Multimedia 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 49
Video Browsing
Application to surveillance video browsing
Surveillance Recordings Traffic Cameras
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 50
26. Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 51
Content Sets
Chroma-based Video
Segmentation Ground-truth
(CVSG)
Corpus of video sequences and segmentation
masks created to provide a representative test-
set whereby video segmentation algorithms
can be quantitatively evaluated and fairly
compared.
Ground-truth data have been focused on
evaluation of motion-based segmentation
masks, as motion seems to be a very common
criterion for segmentation within a large
number of domains.
Foregrounds and backgrounds have been
combined trying to obtain a reasonable degree
of realism in the final sequence.
http://www-vpu.ii.uam.es/CVSG/
AFabrizio Tiburzi, Marcos Escudero, Jesús Bescós, José M. Martínez, “A Ground-truth for Motion-based Video-object Segmentation”, Proc. of ICIP’08.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 52
27. Content Sets
A person detection dataset
(PDds)
a dataset composed of several annotated
surveillance sequences of different levels of
complexity.
Sequences have been extracted from public
datasets related with the people
detection/object classification task:
PETS2006
WCAM
VISOR
CVSG
The well known “hall monitor”
sequence.
AVSS2007
http://www-vpu.ii.uam.es/PDds/
Alvaro Garcia-Martin, José M. Martínez, “Robust real time moving people detection in surveillance scenarios”, Proc. of AVSS'2010.
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 53
Performance evaluation without ground-truth
Failure of video analysis systems is expected in real situations
Classic performance evaluation based on ground-truth
Very expensive to produce (and prone to human error)
Not available during online analysis
Only covers a small portion of video sequences (data variability)
Desirable solution Performance evaluation without ground-truth
Based on properties of the empirical results
Multiple applications:
• Evaluation over large datasets without ground-truth
• Algorithm ranking and combination
• Automatic control of online analysis (self-tuning)
Useful to qualitative rank analysis algorithms
Low correlation in complex situations (multimodal backgrounds in object segmentation, adaptation to wrong
targets in object tracking,…)
54 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 54
28. Performance evaluation w/o GT: BGS
Background subtraction (BGS) is the most popular technique for moving object
segmentation
Evaluation of BGS algorithms in challenging situations
Study difference between inner and outer regions of object boundaries in
terms of color and motion
Metrics defined in C. Erdem, et al, “Performance
measures for video object segmentation and
tracking”, in IEEE Trans. on IP, 13(7):937–951, 2004.
Juan C. SanMiguel and José M. Martínez. “On the evaluation of background subtraction algorithms without ground-truth“,en Proc. of AVSS’10
55 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 55
Performance evaluation w/o GT: BGS
Current results (evaluation of BGS algorithms)
Frame Ground Truth MoG KDE GAMMA EigBG
Results for frame 200 (ID1 sequence) Results for frame 100 (ID9 sequence)
P1 GT measure DC1, DC2, DM1, DM2 NGT measures
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 56
29. Performance evaluation w/o GT: Tracking
Object tracking is an important tool in many video applications
Study of different indicators of tracking failure in challenging situations
Motion smoothness (MS)
Time-reversibility of object motion (TIM)
Frame t-1 Frame t Frame t-1 Frame t
Tracking result at t Forward estimation at t-1 Backward estimation at t-1
Liu, R.; Li, S.; Yuan, X.; He, R.; “Online Determination of Track Loss
Using Template Inverse Matching”, Proc. of VS 2008
Juan C. SanMiguel, A. Cavallaro and José M. Martinez “Evaluation of on-line quality estimators for object tracking detectors”, en Proc. of ICIP’10
57 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 57
Performance evaluation w/o GT: Tracking
Spatial uncertainty of tracker (COV) Likelihood of the matching process (OL)
Badrinarayanan, V.; Perez, P.; Le Clerc, F., Oisel, L.;
N. Vaswani, “Additive change detection in nonlinear systems
“Probabilistic Color and Adaptive Multi-Feature
with unknown change parameters”, IEEE Transactions on Signal
Tracking with Dynamically Switched Priority Between
Processing, 55(3):859-872, 2007
Cues”, Proc of ICCV‘2007
Frame 115 Frame 135 Frame 150 Frame 200 Frame 270
Tracking result
Target candidates
6
4
e od
2
O s rva n lik lih o
0
b e tio
-2
-4
-6
0 50 100 150 200 250 300
Frame
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 58
30. Performance evaluation w/o GT: Tracking
Current results
1
0.9
0.8
True positive rate (Sensitivity)
0.7
Area Under False Positive True Positive 0.6
MEASURE
Curve (AUC) rate rate
MS 0.55 ± 0.0599 0.43 ± 0.0795 0.53 ± 0.0727 0.5
TIM 0.69 ± 0.0358 0.37 ± 0.0481 0.60 ± 0.0651
0.4
OL 0.78 ± 0.0887 0.20 ± 0.0554 0.65 ± 0.1133
COV 0.70 ± 0.0675 0.35 ± 0.0619 0.72 ± 0.0986 0.3
0.2
1. MS fails (~a random classifier) 0.1
2. TIM low performance
0
3. OL medium performance 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4. COV low performance Fals pos
e itive rate (1-Specificity)
59 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 59
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 60
31. Other topics:
ToF cameras for gestual interfaces
Courtesy of project CENIT-VISION
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 61
Contents
Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection
Applications
Unattended and stolen object detection
Event detection
Video Browsing
Evaluation
Content sets
Performance evaluation without ground-truth
Other topics
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 62
32. Acknowledgements
Work partially supported by:
Cátedra UAM-Infoglobal
CENIT 2007-1007 Vision
TEC2007-65400 (SemanticVideo)
S-0505-TIC-0223 ProMultiDis-CM
IST-FP6-027685 Mesh
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 63
Video Processing and Understanding in
Surveillance Applications
…segmentation, multimodal backgrounds, stationary foreground,
tracking, people detection, shadow detection, stolen and abandoned
abandoned
objects, human actions detection, video browsing, evaluation,…
evaluation,…
José María Martínez Sánchez
Hands-on Image Processing 2010 (HOIP’10)
16-17 November 2010
Escuela Politécnica Superior Universidad Autónoma de Madrid Video Processing and Understanding Lab
E28049 Madrid (SPAIN) Grupo de Tratamiento e Interpretación de Vídeo
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 64