Hoip10 presentacion video-vigilancia_uam

Video Processing and Understanding in
Surveillance Applications
…segmentation, multimodal backgrounds, stationary foreground, tracking,
tracking,
people detection, shadow detection, unattended and stolen objects, human
objects,
actions detection, video browsing, evaluation, ToF cameras, …

José M. Martínez
JoseM.Martinez@uam.es

Hands-on Image Processing 2010 (HOIP’10)
16-17 November 2010

Escuela Politécnica Superior Universidad Autónoma de Madrid Video Processing and Understanding Lab
E28049 Madrid (SPAIN) Grupo de Tratamiento e Interpretación de Vídeo

Contents

Introduction
Application enablers
Segmentation
Tracking
People detection
Shadow detection

Applications
Unattended and stolen object detection
Event detection
Video Browsing

Evaluation
Content sets
Performance evaluation without ground-truth

Other topics

Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) Hands-on Image Processing (HOIP’10), 16-17 Nov 2010 2

Introduction

Video Processing and Understanding Lab

http://www-vpu.eps.uam.es

Research group focused on digital image processing theory, methods and
applications aimed for video sequence analysis and visual content
adaptation.

The main fields of application are video-surveillance systems and video
repositories (video sequences indexing and retrieval).

The activity of the group is mainly oriented to the real-time and on-line
processing of video sequences, and constraints associated to such
operation modality are applied to all the lines of research of the group.


Introduction

Video Surveillance and Monitoring @VPULab
Low level
Segmentation
Tracking

Mid level
People detection
Shadow detection

High level
Human action detection
Video browsing

Evaluation


Credits

The works presented in these slides are part of
the research of several members of VPULab
Eng. Álvaro Bayona Dr. Jesús Bescós Eng. Marcos Escudero

Eng. Víctor Fernández-Carbajales Dr. Miguel Ángel García

Eng. Álvaro García Dr. José M. Martínez Eng. Javier Molina

Eng. José Antonio Pajuelo Eng. Juan Carlos San Miguel

Eng. Fabricio Tiburzi Dr. Víctor Valdés


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Segmentation:
Introduction

Different approaches

In video surveillance usually motion based segmentation with static cameras

“Classical” Background subtraction algorithms

Gamma-based background subtraction
• Optimized version of A. Cavallaro, O. Steiger, T. Ebrahimi, “Semantic Video Analysis for
Adaptive Content Delivery and Automatic Description”, IEEE Trans. On Circuits and Systems
for Video Technology, 15(10): 1200-1209, October 2005.

Algorithms for moving cameras

We will present two approaches:

Region-based foreground segmentation

Stationary foreground detection


Segmentation:
Introduction
Segmentation aims to

A video description closer to human perception.

A decrease of ‘semantic’ noise (multi-modal backgrounds, illumination
artefacts) and signal noise (impulsive noise).

Y

Y

Y


Segmentation:

Background/foreground segmentation is usually performed at pixel level
(i.e. Statistical Background Modelling)
Region based analysis, understanding regions as groups of pixels
sharing similar attributes, help to provide:
Tools

A Robust-to-illumination region segmentation
Reflectance oriented Mean-Shift segmentation
Reflectance-homogeneous regions are fused based on RGB colour angle

An Eigenvalue based framework for region characterization and matching
Covariance of extracted features is computed for each region
Matching is performed by modelling the cost of updating a region

A Multi-layer region-based background model

Aims to model the different variations that each background region can
undergo


Segmentation:

Original Frame

Region Segmentation

Shadows Ground-Truth

Marcos Escudero, Jesús Bescós, “Region-based video object segmentation robust to illumination”, Proc. of WIAMIS’10.


Segmentation:

S ( A, B ) = A ∩ B A ∪ B
Original Frame Mean-Shift GT SoA [2] Initial Proposed SoA [2] Initial Proposed
MR
foe: 0.911 0.300 0.899
1816

WS
foe: 0.851 0.156 0.822
624

AP
0.508 0.493 0.494
foe:
3264

[2] L. Li, et al. “Statistical modelling of complex backgrounds for foreground object detection,” IEEE Transactions on Image Processing, 13 (11), 2004.

Masks are tight to
real objects
without post processing

Marcos Escudero, Jesús Bescós, “A robust framework for region-based video object segmentation”, Proc. of ICÎP’10.


Segmentation:
Hot starts Shadows

More Accurate Segmentation
Marcos Escudero, Jesús Bescós, “A robust framework for region-based video object segmentation”, Proc. of ICÎP’10.


Segmentation:

Detection of stationary foreground objects (e.g., abandoned objects in crowed
places, like airports, underground stations and mass events).
We implemented and evaluated the most relevant approaches from the state of the
art.

Experimental results showed that the sub-sampling approaches obtained better
results.

Alvaro Bayona, Juan C. SanMiguel, Jose M. Martinez: "Comparative evaluation of stationary foreground object detection algorithms based on background subtraction techniques", Proc. of AVSS’09


Segmentation:
Sub-sampling approaches introduced several false positives in crowed sequences. To
reduce it, we have introduced some modifications based on:
1.Change background subtraction technique
2.Removing false positive on crossing zones
3.Tolerance to occlusions
The proposed algorithm for stationary foreground object detection is based on the sub-
sampling scheme, a frame difference scheme and an occlusion handling model.

Alvaro Bayona, Juan C. SanMiguel, Jose M. Martinez: "Stationary foreground detection using background subtraction and temporal difference in video surveillance", Proc. of ICIP’10


Segmentation:
We evaluated the proposed algorithm
and compare results with the base
algorithm using sequences from PETS
2006, PETS 2007 and ILIDS for AVSS
2007 datasets.
Experimental results showed that the
proposed algorithm increases the
detection of stationary foreground
regions as compared to the base
algorithm in terms of precision and
recall.


Segmentation:


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Tracking

Main steps:

Detection of objects (blobs).
Gamma-based background subtraction
A BC
Characterization of objects. 35% 55%
95%
Intra-blobs: Visual attention-driven selection
A B C
Colour (luminance)
85% 75% 85%

Identification/Assignment of objects. AB C
95%
Probabilistic graph 95%

Tested in controlled and not crowed AB C
environments 65% 35% 89%

A B C


Tracking

Input Video Segmentation Object Detection/Extraction
(using Background Subtraction)

Frame Frame
Anterior Actual

Visual Attention Object Characterization Associations and Tracking
(intra-blobs selection)
(intra-


Tracking

Input Video Segmentation Object Detection/Extraction
(using Background Subtraction)

Visual Attention Object Characterization Tracking


Tracking

Other Examples


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


People detection

Automatic people detection is actually a complex problem with
multiple applications, not only in video surveillance, but also
different areas like intelligent systems (robotic), video games, etc.

People

No
People

People Variability


People detection

Fusion algorithm

Background segmentation

Fusion 3 simple independent people detectors:
• Aspect ratio

• Ellipse fitting [2]

• Ghost algorithm [3]

[2] F. Xu and K. Fujimura. Human detection using depth and gray images. Proc. of AVSS 2003.
[3] I. Haritaoglu, D. Harwood, and L. S. Davis. Ghost: a human body part labeling system using silhouettes. Proc. of ICPR 1998.

Víctor Fernández-Carbajales, MigueláAngel García, and José M. Martínez. “Robust people detection by fusion of evidence from multiple methods”, Proc. of WIAMIS’08


People detection

Edge algorithm People
Model

Real time adaptation [5].
[5] B. Wu and R. Nevatia. Detection of multiple, partially Background/
People/No People
occluded humans in a single image by bayesian combination Foreground Object Extraction Object Tracking
Classification
Extraction
of edgelet part detectors. In Proc. of ICCV 2005.

Decision

Four edge models of body
parts (body, head, torso and
legs).

Alvaro Garcia-Martin, Jose M. Martinez: "Robust Real Time Moving People Detection in Surveillance Scenarios", Proc. of AVSS’10


People detection


People detection


People detection

Results vs. Complexity

Low Medium

High
Computacional Cost

[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. of CVPR 2005.
[7] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In Proc. of CVPR 2009.


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Shadow detection

Shadow detection process usually involves a number of classifiers which are
trained with labelled data (training phase)

The availability and creation of training data is a critical issue:
Difficulty of manual annotation (determining the accuracy of the learned models)
Amount of data used (the classifier will be very specific if it is huge or it won’t be
optimal if it is small)

Avoid the use of training data in classification tasks
Tattersall, S. and Dawson-Howe, K., “Adaptive Shadow Identification through
Automatic Parameter Estimation in Video Sequences,” Proc. of MVIP, pp. 57-
64, 2003.
Conaire, C.; O'Connor, N.; Cooke E; Smeaton, A., “Detection Thresholding
Using Mutual Information”, Proc of VISAPP., pp 408-415, 2006


Shadow detection

On-line learning of optimum parameters without training data

Cooperative on-line training of independent detectors to obtain the optimum
configuration (e.g., thresholds) by maximizing the agreement between
independent detectors
C. Conaire, N. O’Connor, A. Smeaton, “Detector adaptation by maximisng agreement between independent detectors, Proc. of CVPR’07.

Improvement of standard HSV shadow detection, base algorithm and its
adaptation to analysis of video sequences

Key aspects
Analysis of brightness and saturation decrease (HSV colour space)
Analysis of surfaces with similar brightness decrease
Signal correlation as agreement measure
Search of optimum configuration: Gradient ascent algorithm with coarse and fine
stages
Two options: accuracy in shadow or object detection
Juan Carlos SanMiguel, José M. Martínez “Shadow Detection in video surveillance by maximizing the agreement between independent detectors”, Proc. of ICIP’09


Shadow detection

Experimental results (PETS 2006 dataset)

DCU
[Conaire et al, CVPR2007]
DCU Ad.
Adaptation of DCU
VPU2
(shadow accurate)
VPU3
(object accurate)

32 Hands-on Image Processing (HOIP’10), 16-17 Nov 2010
Video Processing and Understanding in Surveillance Video (JoseM.Martinez@uam.es) 32

Shadow detection

Experimental results (Intelligent room sequence)

DCU
[Conaire et al, CVPR2007]
DCU Ad.
Adaptation of DCU
VPU2
(shadow accurate)
VPU3
(object accurate)


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics



Due to recent events, there is a great interest in detecting dangerous or strange
situations specially in public areas as airports, stations, subways, entrance to
buildings and mass events
• Vehicle accidents
• Intrusion in restricted areas (cars, people,...)
• Detection or tracking suspicious objects

Subway/Railway/Airport Museums

Stolen
Unattended
Object
Object


Static and non-people objects

System overview Shape
Adjustment
Shape adjustment (snakes) Shape Colour
similarity similarity
Unattended/Stolen object detectors
Gradient-based detectors Low-Gradient High-Gradient
Colour
Colour-based detectors Histogram
detector detector
detector
Combination
Gaussian model trained for Evidences
Unattended and stolen classes
Combination as an average Combination
Heaviside step function applied
for filtering out unreliable detectors Unattended or Stolen
Object
Real-time and robust detection of unattended and stolen objects
Low computational complexity
Limited application to crowded scenarios (due to previous tracking analysis)
Juan C. SanMiguel, José M. Martínez, “Robust unattended and stolen object detection by fusing simple algorithms”, Proc. of AVSS’08


Static and non-
people objects
Shape
Shape Adjustment
colour

Gradient similarity detectors Low- High- Colour
Gradient Gradient Histogram

• 1st and 2nd detectors are based on the shape similarity Combination

» Between the object shape previously adjusted and the real Unattended or Stolen Object
shape in the current image (removing redundant shape information)
• Gradient information used to shape extraction from current image

Region of interest Object Mask Shape

Mask
Analysis Shape analysis
(Active Contours) CHECK MATCHING

Image
Analysis
Candidate object
Background Current image Thresholded Diff.


Static and non-
people objects

Colour similarity detector
Shape
Shape Adjustment
colour

Low- High- Colour

Background image H1 Gradient Gradient Histogram

Combination
Hue
Histogram R1 in background image Unattended or Stolen Object

(16 bins)
H2
Battacharya
distance
Current image
R2 in current image dB(H1,H2)
H3 dB(H1,H3)

R2 in background image

MCH= dB(H1,H3) - dB(H1,H2)
If MCH < 0 Unattended object
If MCH > 0 Stolen object

ECH {U , S} = EµCH {U ,S } ,σ CH {U ,S } ( M CH )

Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Event detection

System overview Annotations Video Input

2D real-time analysis Foreground
Foreground/background segmentation
Segmentation
Blob tracking
Person-Object classification
Blob Tracking
Event detection (Human interactions)
Domain
Use of contextual information: Ontology Person-Object
Ontology with object models Classification
Data: Online generated (events) + User generated (annotations)
Feature
Real-time analysis (↓ Resolution, ↓ Computational Complexity) Extraction
Event modelling
High FrameRate (> 10fps) Contextual Event
Modelling constraints: Info. Module Detection
• HandUp: height of the hand higher than head
• Get/Leave Object: contextual object needed. Events
No intra-blob analysis (1 blob 1 person/object)
Juan C. SanMiguel, Marcos Escudero, Jose M. Martinez and Jesus Bescos, “Real-time event detection in smart rooms", submitted (2010)


Event detection

Input data (Blobs, their properties and contextual objects)

Modeled events:

Human-object Inter- Human activity Status
action (Leave/Get/Use) (Walking, HandUp) (Presence, Counter)
- Constraints (C) over -Temporal evolution of - Finite State Machine
blob properties and spatial attributes of the - Temporal average to
contextual information blob: mass center and increase reliability
- Bayesian combination skin areas
F<α
GetObject Skin Areas F>β F<α
•C1: Blob appears now
•C2: Blob belongs to background No
Presence Presence
•C3: Blob classified as object
•C4: There is an associated cont. object F>β
•C5: A person is doing the action
Legs Mass
•C6: Distance person-object less than th F person exists in the last
center
N frames


Event detection

Experimental results

Courtesy of project CENIT-VISION


Event detection

Experimental results



Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Video Browsing

Browsing of large repositories is a complex and time
(resources) consuming task
Real-time and on-line summarization allowing

Real-time and on-line summarization and browsing during capture (e.g.,
multicamera systems)

Interactive browsing based on event detection and annotations


Video Browsing

Real-time video summarization algorithm aimed to carry on-line analysis of the video
content (e.g., while being recorded) and to progressively generate the video summary. In
opposition to existing techniques, the algorithm does not require the complete original
content for the generation of the results.

The real-time video summarization algorithm is based on the dynamic creation of a
‘summarization tree’
Exclusion Node Empty Node
Starting Node
? Video Fragment Inclusion Node
A

B

C

D

E

E D D C C C C B B B B B B B B A A A A A A A A A A A A A A A A
E E D D E D D C C C C E D D C C C C B B B B B B B B
E E E D D E E D D E D D C C C C
E E E E D D
E
Resulting Summaries

Víctor Valdes, José M. Martínez, “Binary Tree Based On-Line Video Summarization”, Proc. of ACM Multimedia 2008


Video Browsing

RISPlayer Application: Interactive and personalized video
summaries creation and visualization.
Video Browsing Area
Summary Generation Controls

Víctor Valdés, José M. Martínez, “Introducing RISPlayer: Real-time Interactive Generation of Personalized Video Summaries”, Proc. of ACM Multimedia 2010


Video Browsing

Application to surveillance video browsing
Surveillance Recordings Traffic Cameras


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Content Sets

Chroma-based Video
Segmentation Ground-truth
(CVSG)
Corpus of video sequences and segmentation
masks created to provide a representative test-
set whereby video segmentation algorithms
can be quantitatively evaluated and fairly
compared.

Ground-truth data have been focused on
evaluation of motion-based segmentation
masks, as motion seems to be a very common
criterion for segmentation within a large
number of domains.

Foregrounds and backgrounds have been
combined trying to obtain a reasonable degree
of realism in the final sequence.

http://www-vpu.ii.uam.es/CVSG/

AFabrizio Tiburzi, Marcos Escudero, Jesús Bescós, José M. Martínez, “A Ground-truth for Motion-based Video-object Segmentation”, Proc. of ICIP’08.


Content Sets

A person detection dataset
(PDds)
a dataset composed of several annotated
surveillance sequences of different levels of
complexity.

Sequences have been extracted from public
datasets related with the people
detection/object classification task:

PETS2006

WCAM

VISOR

CVSG

The well known “hall monitor”
sequence.

AVSS2007

http://www-vpu.ii.uam.es/PDds/
Alvaro Garcia-Martin, José M. Martínez, “Robust real time moving people detection in surveillance scenarios”, Proc. of AVSS'2010.



Failure of video analysis systems is expected in real situations

Classic performance evaluation based on ground-truth
Very expensive to produce (and prone to human error)
Not available during online analysis
Only covers a small portion of video sequences (data variability)

Desirable solution Performance evaluation without ground-truth
Based on properties of the empirical results
Multiple applications:
• Evaluation over large datasets without ground-truth
• Algorithm ranking and combination
• Automatic control of online analysis (self-tuning)

Useful to qualitative rank analysis algorithms
Low correlation in complex situations (multimodal backgrounds in object segmentation, adaptation to wrong
targets in object tracking,…)


Performance evaluation w/o GT: BGS

Background subtraction (BGS) is the most popular technique for moving object
segmentation

Evaluation of BGS algorithms in challenging situations

Study difference between inner and outer regions of object boundaries in
terms of color and motion

Metrics defined in C. Erdem, et al, “Performance
measures for video object segmentation and
tracking”, in IEEE Trans. on IP, 13(7):937–951, 2004.

Juan C. SanMiguel and José M. Martínez. “On the evaluation of background subtraction algorithms without ground-truth“,en Proc. of AVSS’10


Performance evaluation w/o GT: BGS

Current results (evaluation of BGS algorithms)
Frame Ground Truth MoG KDE GAMMA EigBG

Results for frame 200 (ID1 sequence) Results for frame 100 (ID9 sequence)

P1 GT measure DC1, DC2, DM1, DM2 NGT measures

Performance evaluation w/o GT: Tracking

Object tracking is an important tool in many video applications

Study of different indicators of tracking failure in challenging situations

Motion smoothness (MS)

Time-reversibility of object motion (TIM)
Frame t-1 Frame t Frame t-1 Frame t

Tracking result at t Forward estimation at t-1 Backward estimation at t-1

Liu, R.; Li, S.; Yuan, X.; He, R.; “Online Determination of Track Loss
Using Template Inverse Matching”, Proc. of VS 2008
Juan C. SanMiguel, A. Cavallaro and José M. Martinez “Evaluation of on-line quality estimators for object tracking detectors”, en Proc. of ICIP’10



Spatial uncertainty of tracker (COV) Likelihood of the matching process (OL)
Badrinarayanan, V.; Perez, P.; Le Clerc, F., Oisel, L.;
N. Vaswani, “Additive change detection in nonlinear systems
“Probabilistic Color and Adaptive Multi-Feature
with unknown change parameters”, IEEE Transactions on Signal
Tracking with Dynamically Switched Priority Between
Processing, 55(3):859-872, 2007
Cues”, Proc of ICCV‘2007

Frame 115 Frame 135 Frame 150 Frame 200 Frame 270

Tracking result
Target candidates

6

4
e od

2
O s rva n lik lih o

0
b e tio

-2

-4

-6
0 50 100 150 200 250 300
Frame



Current results

1

0.9

0.8

True positive rate (Sensitivity)
0.7

Area Under False Positive True Positive 0.6
MEASURE
Curve (AUC) rate rate
MS 0.55 ± 0.0599 0.43 ± 0.0795 0.53 ± 0.0727 0.5
TIM 0.69 ± 0.0358 0.37 ± 0.0481 0.60 ± 0.0651
0.4
OL 0.78 ± 0.0887 0.20 ± 0.0554 0.65 ± 0.1133
COV 0.70 ± 0.0675 0.35 ± 0.0619 0.72 ± 0.0986 0.3

0.2

1. MS fails (~a random classifier) 0.1
2. TIM low performance
0
3. OL medium performance 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4. COV low performance Fals pos
e itive rate (1-Specificity)


Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Other topics:
ToF cameras for gestual interfaces



Contents

Introduction
Segmentation
Tracking
People detection
Shadow detection

Applications
Event detection
Video Browsing

Evaluation
Content sets

Other topics


Acknowledgements

Work partially supported by:
Cátedra UAM-Infoglobal

CENIT 2007-1007 Vision

TEC2007-65400 (SemanticVideo)

S-0505-TIC-0223 ProMultiDis-CM

IST-FP6-027685 Mesh


Video Processing and Understanding in
Surveillance Applications
…segmentation, multimodal backgrounds, stationary foreground,
tracking, people detection, shadow detection, stolen and abandoned
abandoned
objects, human actions detection, video browsing, evaluation,…
evaluation,…

José María Martínez Sánchez

Hands-on Image Processing 2010 (HOIP’10)
16-17 November 2010

Escuela Politécnica Superior Universidad Autónoma de Madrid Video Processing and Understanding Lab
E28049 Madrid (SPAIN) Grupo de Tratamiento e Interpretación de Vídeo


Hoip10 presentacion video-vigilancia_uam

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Hoip10 presentacion video-vigilancia_uam

Semelhante a Hoip10 presentacion video-vigilancia_uam (20)

Mais de TECNALIA Research & Innovation

Mais de TECNALIA Research & Innovation (20)

Último

Último (20)

Hoip10 presentacion video-vigilancia_uam