SlideShare uma empresa Scribd logo
1 de 42
Event Detection in Surveillance
Video: How we Got Here, What We
Should Do Next
Prof. Alan F. Smeaton
E: alan.smeaton@dcu.ie
Talk Agenda
• Importance of visual content
• Manual annotation, automatic annotation
• TRECVid – what it is, what it does
• Surveillance Event Detection task – how far its got
• Understanding Crowds
• Crowd counting, crowd behaviour, metric performance
• Surveillance Video
• What can we do now?
• What’s the roadmap ?
2
We know … Manual Annotation
3
• Annual workshop series (2001-) promoting
research/progress in content-based video analysis
• Foundation for large-scale laboratory testing & forum for
exchange of research ideas and discussion of
approaches – what works, what doesn’t, and why.
• Focus: content-based tasks
• search / detection / summarization / segmentation
• Realistic tasks and test collections
• focus on relatively high-level functionality (e.g.
interactive search) & measurement against human
abilities
• Provides data, tasks, and uniform scoring procedures
4
What is TRECVid ?
5
English
TV News
0
500
1000
1500
2000
2500
3000
3500
4000
4500
TV news BBC rushes
Sound &
vision
Airport
Surveillance
Internet Archive
Creative Commons
HAVIC
Flickr
BBC
East-
Enders
… 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
BBC
hyper-
linking
Blib.tv YFCC1
00M
TRECVid Video Data: 2003 to 2016
1. Shot boundary detection
2. Ad hoc search
3. Features/semantic
indexing
4. Stories
5. Camera motion
6. BBC summaries
7. Copy detection
8. Surveillance events
9. Known-item search
10.Instance search
11.Multimedia event
detection
12.Multimedia event
recounting
13.Video hyperlinking
14.Localization
15.Video to text (captions)
6
TRECVid Tasks: 2001 to 2018
7
Groups
Finished
Task
code
Task
name
8 SED
Surveillance
event detection
10 AVS
Ad-hoc Video
Search
8 INS Instance search
6 MED
Multimedia event
detection
3 LNK
Video
hyperlinking
16 VTT
Pilot task
(Video_to_Text)
20
10
7
2
Asia Europe
North America Australia
TRECVid 2017 Tasks and 39 Finishers
TRECVid Concept Detection: 2003
In 2012, this happened
• ImageNet, an equivalent of TRECVid,
for images rather then videos
• Krizhevsky, Sutskever and Hinton @
Univ Toronto, “won” the ImageNet
large scale visual recognition
challenge with a “convolutional neural
network”
Now everybody tries deep learning, for everything
• Surveillance event detection - leverage machine learning
for detecting a pre-defined set of events … in airport
surveillance video
• Use case … detect visual events (people engaged in
particular activities) in a large collection of streaming video
data collected by the UK Home Office
• Part of TRECVid since 2008 so 10 years
• 7 events …
11
TRECVid Surveillance Event Detection
1. CellToEar: put a cell phone to his/her head or ear
2. Embrace: put one or both arms at least part way around another
person (POOR)
3. ObjectPut: drop or put down an object (VERY GOOD)
4. PeopleMeet: One or more people walk up to one or more other
people, stop, and some communication occurs
5. PeopleSplitUp: From two or more people, standing, sitting, or moving
together, communicating, one or more people separate themselves
and leave the frame (VERY GOOD PERFORMANCE)
6. PersonRuns: Someone runs (POOR)
7. Pointing: Someone points (VERY GOOD PERFORMANCE)
12
TRECVid Surveillance Event Detection
Participating Research Group
#Years
Embrace
ObjPut
PeopMeet
PeopSPlit
PersRuns
Point
CellToEar
Beijing Univ Posts & Telegraphs 8
CMU/Renmin Univ/Univ of Sydney/Shandong Univ 9
Hikvision Research Institute 1
Wuhan University 2
ITI Greece 2
NII – Hitachi - UiT 1
Southeast Univ Jiulonghu Campus 2
Univ of Queensland, Australia 2
(4 China, 1 Greece, 1 Japan, 1 US, 1 Vietnam) (8 groups in total)
13
TRECVid SED 2016
A 10 year critique …
• Progress is slow, but improving because groups use deep
learning
• Task is still going because its important but available
training data is the bottleneck
• Approaches are tailored and tuned to each activity … we
can’t afford to do that
• These aren’t anomalous activities, these are everyday
activities, this is behaviour monitoring for the purpose of
behaviour monitoring and then anomaly detection
• But … do we need the events to detect the anomalies ?
14
What has this got to do with surveillance video ?
15
16
He needs help !
He needs help !
• 2016 – 100M new surveillance cameras shipped
worldwide .. in 2018 it will be 130M
• But – as costs fall, so vendors need ways to differentiate and using
deep learning for analysing video content is one of those ways
• Deep Learning is already appearing …
– Deep Learning equipment has a fast uptake in China
– Deep leaning enabled cameras, chips from Nvidia, Movidius or
others
– Body worn cameras and vehicular dashcams also have great
potential, especially when combined with GPS and accelerometers
17
• What are the Deep Learning services ?
1. Face Recognition and tracking is the early, and easiest,
scenario, immediately useful in safe city applications
2. Detecting events outliers, anomalies, as well as usual
patterns, public safety abnormal event detection … can be
in real time or archive search for evidence gathering
3. Large scale search – across cameras
• Deep Learning addresses a large big data problem, fusion of
heterogeneous data sources – hence body-worn cameras – and
also large scale search
18
He needs help !
Understanding Crowds
19
Motivation
20
Motivation for Understanding Crowds
• Crowd Density Estimation
• Level of crowd congestion observed at a given point in time
• Crowd Counting (state of the art results)
• True number of people present in an image of a crowded scene
• Crowd Segmentation
• Locate different crowd characteristics in a scene
• Crowd Behaviour Classification (state of the art results)
• Categorise the behaviour observed in a crowded scene
• Anomalous Behaviour Detection
• Identify behaviour which strays significantly from an established
norm, typically learned from normal behaviour training data
21
Motivation for Understanding Crowds
• Crowd Density Estimation
• Level of crowd congestion observed at a given point in time
• Crowd Counting (state of the art results)
• True number of people present in an image of a crowded scene
• Crowd Segmentation
• Locate different crowd characteristics in a scene
• Crowd Behaviour Classification (state of the art results)
• Categorise the behaviour observed in acrowded scene
• Anomalous Behaviour Detection
• Identify behaviour which strays significantly from an established
norm, typically learned from normal behaviour training data
22
Recent Approaches to Crowd Counting
1. Counting by detection
• Training a visual object
detector to find and count
each person
• Performs poorly with +100
people in frame
2. Counting by regression
• Learn a direct mapping
between low-level features
and the overall number of
people in frame
23
Deep learning approaches
lead to significant
improvements in counting
accuracy for high density
crowds (100-5000 people)
Crowd Counting – Approach #1
Contributions
• Training set augmentation scheme which improves generalisation
• Deep, single column, fully convolutional network architecture
• Multi-scale count averaging step during inference
25
Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). Fully Convolutional Crowd Counting on Highly
Congested Scenes. 2017. International Conference on Computer Vision Theory and Applications.
Fully Convolutional
Neural Network
Pixel-wise sum= 1544Crowd Count= 1566
Fully Convolutional Crowd Counting on Highly Congested Scenes
Crowd Counting – Approach #1
26
Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). Fully Convolutional Crowd Counting on Highly
Congested Scenes. 2017. International Conference on Computer Vision Theory and Applications.
Fully Convolutional Crowd Counting on Highly Congested ScenesUCF_CC_50 Dataset
Method Mean Absolute Error Mean Squared Error
(Rodriguez et al., 2011) 655.7 697.8
(Lemiptsky and
Zisserman, 2010)
493.4 487.1
(Idress et al., 2013) 419.5 541.6
(Zhang et al.,2015) 467 498.6
(Zhang et al.,2016) 377.6 509.1
Our Approach 338.6 425.5
Our approach improvesupon the state-of-the-art by 11% (MAE) and 13% (MSE)
Our approach improves upon the state-of-the-art by 11% (MAE) and 13% (MSE)
27
Crowd Counting in Action
Academic Datasets
28
Crowd Counting in Action : Academic Datasets
Estimated Person Count : 26
True Person Count: 23
29
Crowd Counting in Action : Academic Datasets
Estimated Person Count : 462
True Person Count: 476
30
Crowd Counting in Action : Academic Datasets
Estimated Person Count : 1544 True Person Count: 1566
Estimated Person Count : 1544
True Person Count: 1566
31
Crowd Counting in Action
CCTV
32
Crowd Counting in Action : CCTV Footage
• Challenging CCTV footage taken from Croke Park Stadium, Dublin
• Same scene observed during a quiet and busy period during a match day
Metric for a video sequence : mean count ± standard deviation
Estimated Count: 0 ± 1.5 True Count: 0 ± 0.8 Estimated Count: 52 ± 6.8 True Count: 70 ± 3.3
Video clips removed
for © reasons !
Contributions
• A new 100 image dataset, fully annotated for crowd counting, violent behaviour detection and
density level classification
• A deep, residual ANN architecture for simultaneous counting, behaviour detection and crowd
density estimation
.
33
Crowd Counting – Approach #2
Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning
Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017.
IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
Multi-Task Neural
Network
Crowd Count
Violent Behaviour Detection
ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting,
Violent Behaviour Detection and Crowd Density Level Classification
Crowd Density Level
Data set
• Apply labels for additional tasks to an existing dataset.
• WWW Crowd clips where either the “Fight” or “Mob” concepts are present.
• Crowd counting GT created in the same way as the UCF_CC_50
34
Crowd Counting – Approach #2
Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning
Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017.
IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting,
Violent Behaviour Detection and Crowd Density Level Classification
Data set
• Apply labels for additional tasks to an existing dataset.
• WWW Crowd clips where either the ”Fight” or ”Mob” concepts are present.
• Crowd counting GT created in the same way as the UCF_CC_50
35
Crowd Counting – Approach #2
Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning
Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017.
IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting,
Violent Behaviour Detection and Crowd Density Level Classification
ResnetCrowd
• Based upon the Resnet18 network of He et al.
• Minimise a loss function which combines losses for each of the outputs.
36
Crowd Counting – Approach #2
Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning
Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017.
IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
Multi-Task Neural
Network
Crowd Count
Violent Behaviour Detection
ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting,
Violent Behaviour Detection and Crowd Density Level Classification
Crowd Density Level
37
Object Counting Has Applications In Many Domains
People Vehicles
Cell Nuclei Wildlife
Marsden, M., McGuinness, K., Little, S., Keogh, C.E. O’Connor, N. E. (2018). People, Penguins and Petri Dishes:
Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting. 2018.
Computer Vision and Pattern Recognition (CVPR)
38
• Single object counting model for multiple domains (People, Vehicles, Cells,
Wildlife): Trained model can be adjusted to each domain
• Mean count error: 19% on ShanghaiTech dataset
• 30% relative improvement on prior approach to crowd counting
• Current state of the art for crowd counting and wildlife counting
Shared Counting
Neural Network
Object Count =
∑ patch counts
Image Patch
Base Network
(pre-trained on
ImageNet)
Crowd Counting – Approach #3
People, Penguins and Petri Dishes: Adapting Object Counting Models To
New Visual Domains And Object Types Without Forgetting
Crowd Counting – Approach #3
• Base object counting regressor
• Set of high-level features are extracted from each image patch using a
pre-trained image classification network
• N-dimensional feature representation is then mapped to an object
count value using a fully connected neural network.
• Domain-specific layers
• Included before each fully connected layer and after the final fully
connected layer.
• Increases the trainable parameter count by just 5%
• Sequential training
• Leverages Rebuffi et al. for learning new tasks over time without
discarding the previously learned functions.
39
Crowd Counting – Approach #3
40
Crowd Counting Cell Counting
Crowd Counting – Approach #3
41
Penguin Counting Vehicle Counting
A Summary of our work in …
• Crowd Density Estimation
• Level of crowd congestion observed at a given point in time
• Crowd Counting
• True number of people present in an image of a crowded
scene
• Crowd Segmentation
• Locate different crowd characteristics in a scene
• Crowd Behaviour Classification
• Categorise the behaviour observed in acrowded scene
• Anomalous Behaviour Detection
• Identify behaviour which strays significantly from an established
norm, typically learned from normal behaviour training data
42
Surveillance Video – Roadmap
• We can compute crowd counts,
crowd segments, traffic volumes, very
accurately
• We’re not good at detecting events
• We can learn behaviour patterns for an
area, campus, stadium, city, from
surveillance video – but by detecting
simpler things, like crowd numbers;
• We can use raw CCTV + raw audio as sensor input streams
• We can determine regular behaviour using e.g. periodicity
• We can deviations from normal, alert security, let them do their job
43

Mais conteúdo relacionado

Semelhante a Huawei STW 2018 public

Image Fusion -Multi Sensor Intel Brochure
Image Fusion -Multi Sensor Intel BrochureImage Fusion -Multi Sensor Intel Brochure
Image Fusion -Multi Sensor Intel Brochure
monicamckenzie
 
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Petia Radeva
 
Mobile eyetracking voor_uxd_testing
Mobile eyetracking voor_uxd_testingMobile eyetracking voor_uxd_testing
Mobile eyetracking voor_uxd_testing
Monkeyshot
 

Semelhante a Huawei STW 2018 public (20)

Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
 
Splunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdfSplunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdf
 
Activity Recognition using RGBD
Activity Recognition using RGBDActivity Recognition using RGBD
Activity Recognition using RGBD
 
People counting in low density video sequences2
People counting in low density video sequences2People counting in low density video sequences2
People counting in low density video sequences2
 
Image Fusion -Multi Sensor Intel Brochure
Image Fusion -Multi Sensor Intel BrochureImage Fusion -Multi Sensor Intel Brochure
Image Fusion -Multi Sensor Intel Brochure
 
control room design.pdf
control room design.pdfcontrol room design.pdf
control room design.pdf
 
Network visualization for financial crime detection
Network visualization for financial crime detectionNetwork visualization for financial crime detection
Network visualization for financial crime detection
 
Event Detection Using Background Subtraction For Surveillance Systems
Event Detection Using Background Subtraction For Surveillance SystemsEvent Detection Using Background Subtraction For Surveillance Systems
Event Detection Using Background Subtraction For Surveillance Systems
 
COMP 4010 Lecture12 - Research Directions in AR and VR
COMP 4010 Lecture12 - Research Directions in AR and VRCOMP 4010 Lecture12 - Research Directions in AR and VR
COMP 4010 Lecture12 - Research Directions in AR and VR
 
COMP 4010 Lecture10 AR/VR Research Directions
COMP 4010 Lecture10 AR/VR Research DirectionsCOMP 4010 Lecture10 AR/VR Research Directions
COMP 4010 Lecture10 AR/VR Research Directions
 
IRJET- Prediction of Anomalous Activities in a Video
IRJET-  	  Prediction of Anomalous Activities in a VideoIRJET-  	  Prediction of Anomalous Activities in a Video
IRJET- Prediction of Anomalous Activities in a Video
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...Lifelogging, egocentric vision and health: how a small wearable camera can he...
Lifelogging, egocentric vision and health: how a small wearable camera can he...
 
COBWEB Summit at the OGC TC Dublin, 2016
COBWEB Summit at the OGC TC Dublin, 2016COBWEB Summit at the OGC TC Dublin, 2016
COBWEB Summit at the OGC TC Dublin, 2016
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
 
Review of Pose Recognition Systems
Review of Pose Recognition SystemsReview of Pose Recognition Systems
Review of Pose Recognition Systems
 
Suspicious Activity Detection
Suspicious Activity DetectionSuspicious Activity Detection
Suspicious Activity Detection
 
Mobile eyetracking voor_uxd_testing
Mobile eyetracking voor_uxd_testingMobile eyetracking voor_uxd_testing
Mobile eyetracking voor_uxd_testing
 
Development of wearable object detection system & blind stick for visuall...
Development of wearable object detection system & blind stick for visuall...Development of wearable object detection system & blind stick for visuall...
Development of wearable object detection system & blind stick for visuall...
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Último (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Huawei STW 2018 public

  • 1. Event Detection in Surveillance Video: How we Got Here, What We Should Do Next Prof. Alan F. Smeaton E: alan.smeaton@dcu.ie
  • 2. Talk Agenda • Importance of visual content • Manual annotation, automatic annotation • TRECVid – what it is, what it does • Surveillance Event Detection task – how far its got • Understanding Crowds • Crowd counting, crowd behaviour, metric performance • Surveillance Video • What can we do now? • What’s the roadmap ? 2
  • 3. We know … Manual Annotation 3
  • 4. • Annual workshop series (2001-) promoting research/progress in content-based video analysis • Foundation for large-scale laboratory testing & forum for exchange of research ideas and discussion of approaches – what works, what doesn’t, and why. • Focus: content-based tasks • search / detection / summarization / segmentation • Realistic tasks and test collections • focus on relatively high-level functionality (e.g. interactive search) & measurement against human abilities • Provides data, tasks, and uniform scoring procedures 4 What is TRECVid ?
  • 5. 5 English TV News 0 500 1000 1500 2000 2500 3000 3500 4000 4500 TV news BBC rushes Sound & vision Airport Surveillance Internet Archive Creative Commons HAVIC Flickr BBC East- Enders … 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 BBC hyper- linking Blib.tv YFCC1 00M TRECVid Video Data: 2003 to 2016
  • 6. 1. Shot boundary detection 2. Ad hoc search 3. Features/semantic indexing 4. Stories 5. Camera motion 6. BBC summaries 7. Copy detection 8. Surveillance events 9. Known-item search 10.Instance search 11.Multimedia event detection 12.Multimedia event recounting 13.Video hyperlinking 14.Localization 15.Video to text (captions) 6 TRECVid Tasks: 2001 to 2018
  • 7. 7 Groups Finished Task code Task name 8 SED Surveillance event detection 10 AVS Ad-hoc Video Search 8 INS Instance search 6 MED Multimedia event detection 3 LNK Video hyperlinking 16 VTT Pilot task (Video_to_Text) 20 10 7 2 Asia Europe North America Australia TRECVid 2017 Tasks and 39 Finishers
  • 9. In 2012, this happened • ImageNet, an equivalent of TRECVid, for images rather then videos • Krizhevsky, Sutskever and Hinton @ Univ Toronto, “won” the ImageNet large scale visual recognition challenge with a “convolutional neural network”
  • 10. Now everybody tries deep learning, for everything
  • 11. • Surveillance event detection - leverage machine learning for detecting a pre-defined set of events … in airport surveillance video • Use case … detect visual events (people engaged in particular activities) in a large collection of streaming video data collected by the UK Home Office • Part of TRECVid since 2008 so 10 years • 7 events … 11 TRECVid Surveillance Event Detection
  • 12. 1. CellToEar: put a cell phone to his/her head or ear 2. Embrace: put one or both arms at least part way around another person (POOR) 3. ObjectPut: drop or put down an object (VERY GOOD) 4. PeopleMeet: One or more people walk up to one or more other people, stop, and some communication occurs 5. PeopleSplitUp: From two or more people, standing, sitting, or moving together, communicating, one or more people separate themselves and leave the frame (VERY GOOD PERFORMANCE) 6. PersonRuns: Someone runs (POOR) 7. Pointing: Someone points (VERY GOOD PERFORMANCE) 12 TRECVid Surveillance Event Detection
  • 13. Participating Research Group #Years Embrace ObjPut PeopMeet PeopSPlit PersRuns Point CellToEar Beijing Univ Posts & Telegraphs 8 CMU/Renmin Univ/Univ of Sydney/Shandong Univ 9 Hikvision Research Institute 1 Wuhan University 2 ITI Greece 2 NII – Hitachi - UiT 1 Southeast Univ Jiulonghu Campus 2 Univ of Queensland, Australia 2 (4 China, 1 Greece, 1 Japan, 1 US, 1 Vietnam) (8 groups in total) 13 TRECVid SED 2016
  • 14. A 10 year critique … • Progress is slow, but improving because groups use deep learning • Task is still going because its important but available training data is the bottleneck • Approaches are tailored and tuned to each activity … we can’t afford to do that • These aren’t anomalous activities, these are everyday activities, this is behaviour monitoring for the purpose of behaviour monitoring and then anomaly detection • But … do we need the events to detect the anomalies ? 14
  • 15. What has this got to do with surveillance video ? 15
  • 17. He needs help ! • 2016 – 100M new surveillance cameras shipped worldwide .. in 2018 it will be 130M • But – as costs fall, so vendors need ways to differentiate and using deep learning for analysing video content is one of those ways • Deep Learning is already appearing … – Deep Learning equipment has a fast uptake in China – Deep leaning enabled cameras, chips from Nvidia, Movidius or others – Body worn cameras and vehicular dashcams also have great potential, especially when combined with GPS and accelerometers 17
  • 18. • What are the Deep Learning services ? 1. Face Recognition and tracking is the early, and easiest, scenario, immediately useful in safe city applications 2. Detecting events outliers, anomalies, as well as usual patterns, public safety abnormal event detection … can be in real time or archive search for evidence gathering 3. Large scale search – across cameras • Deep Learning addresses a large big data problem, fusion of heterogeneous data sources – hence body-worn cameras – and also large scale search 18 He needs help !
  • 21. Motivation for Understanding Crowds • Crowd Density Estimation • Level of crowd congestion observed at a given point in time • Crowd Counting (state of the art results) • True number of people present in an image of a crowded scene • Crowd Segmentation • Locate different crowd characteristics in a scene • Crowd Behaviour Classification (state of the art results) • Categorise the behaviour observed in a crowded scene • Anomalous Behaviour Detection • Identify behaviour which strays significantly from an established norm, typically learned from normal behaviour training data 21
  • 22. Motivation for Understanding Crowds • Crowd Density Estimation • Level of crowd congestion observed at a given point in time • Crowd Counting (state of the art results) • True number of people present in an image of a crowded scene • Crowd Segmentation • Locate different crowd characteristics in a scene • Crowd Behaviour Classification (state of the art results) • Categorise the behaviour observed in acrowded scene • Anomalous Behaviour Detection • Identify behaviour which strays significantly from an established norm, typically learned from normal behaviour training data 22
  • 23. Recent Approaches to Crowd Counting 1. Counting by detection • Training a visual object detector to find and count each person • Performs poorly with +100 people in frame 2. Counting by regression • Learn a direct mapping between low-level features and the overall number of people in frame 23 Deep learning approaches lead to significant improvements in counting accuracy for high density crowds (100-5000 people)
  • 24. Crowd Counting – Approach #1 Contributions • Training set augmentation scheme which improves generalisation • Deep, single column, fully convolutional network architecture • Multi-scale count averaging step during inference 25 Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). Fully Convolutional Crowd Counting on Highly Congested Scenes. 2017. International Conference on Computer Vision Theory and Applications. Fully Convolutional Neural Network Pixel-wise sum= 1544Crowd Count= 1566 Fully Convolutional Crowd Counting on Highly Congested Scenes
  • 25. Crowd Counting – Approach #1 26 Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). Fully Convolutional Crowd Counting on Highly Congested Scenes. 2017. International Conference on Computer Vision Theory and Applications. Fully Convolutional Crowd Counting on Highly Congested ScenesUCF_CC_50 Dataset Method Mean Absolute Error Mean Squared Error (Rodriguez et al., 2011) 655.7 697.8 (Lemiptsky and Zisserman, 2010) 493.4 487.1 (Idress et al., 2013) 419.5 541.6 (Zhang et al.,2015) 467 498.6 (Zhang et al.,2016) 377.6 509.1 Our Approach 338.6 425.5 Our approach improvesupon the state-of-the-art by 11% (MAE) and 13% (MSE) Our approach improves upon the state-of-the-art by 11% (MAE) and 13% (MSE)
  • 26. 27 Crowd Counting in Action Academic Datasets
  • 27. 28 Crowd Counting in Action : Academic Datasets Estimated Person Count : 26 True Person Count: 23
  • 28. 29 Crowd Counting in Action : Academic Datasets Estimated Person Count : 462 True Person Count: 476
  • 29. 30 Crowd Counting in Action : Academic Datasets Estimated Person Count : 1544 True Person Count: 1566 Estimated Person Count : 1544 True Person Count: 1566
  • 30. 31 Crowd Counting in Action CCTV
  • 31. 32 Crowd Counting in Action : CCTV Footage • Challenging CCTV footage taken from Croke Park Stadium, Dublin • Same scene observed during a quiet and busy period during a match day Metric for a video sequence : mean count ± standard deviation Estimated Count: 0 ± 1.5 True Count: 0 ± 0.8 Estimated Count: 52 ± 6.8 True Count: 70 ± 3.3 Video clips removed for © reasons !
  • 32. Contributions • A new 100 image dataset, fully annotated for crowd counting, violent behaviour detection and density level classification • A deep, residual ANN architecture for simultaneous counting, behaviour detection and crowd density estimation . 33 Crowd Counting – Approach #2 Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017. IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) Multi-Task Neural Network Crowd Count Violent Behaviour Detection ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification Crowd Density Level
  • 33. Data set • Apply labels for additional tasks to an existing dataset. • WWW Crowd clips where either the “Fight” or “Mob” concepts are present. • Crowd counting GT created in the same way as the UCF_CC_50 34 Crowd Counting – Approach #2 Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017. IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification
  • 34. Data set • Apply labels for additional tasks to an existing dataset. • WWW Crowd clips where either the ”Fight” or ”Mob” concepts are present. • Crowd counting GT created in the same way as the UCF_CC_50 35 Crowd Counting – Approach #2 Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017. IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification
  • 35. ResnetCrowd • Based upon the Resnet18 network of He et al. • Minimise a loss function which combines losses for each of the outputs. 36 Crowd Counting – Approach #2 Marsden, M., McGuinness, K., Little, S., O’Connor, N. E. (2017). ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification. 2017. IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) Multi-Task Neural Network Crowd Count Violent Behaviour Detection ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification Crowd Density Level
  • 36. 37 Object Counting Has Applications In Many Domains People Vehicles Cell Nuclei Wildlife Marsden, M., McGuinness, K., Little, S., Keogh, C.E. O’Connor, N. E. (2018). People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting. 2018. Computer Vision and Pattern Recognition (CVPR)
  • 37. 38 • Single object counting model for multiple domains (People, Vehicles, Cells, Wildlife): Trained model can be adjusted to each domain • Mean count error: 19% on ShanghaiTech dataset • 30% relative improvement on prior approach to crowd counting • Current state of the art for crowd counting and wildlife counting Shared Counting Neural Network Object Count = ∑ patch counts Image Patch Base Network (pre-trained on ImageNet) Crowd Counting – Approach #3 People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting
  • 38. Crowd Counting – Approach #3 • Base object counting regressor • Set of high-level features are extracted from each image patch using a pre-trained image classification network • N-dimensional feature representation is then mapped to an object count value using a fully connected neural network. • Domain-specific layers • Included before each fully connected layer and after the final fully connected layer. • Increases the trainable parameter count by just 5% • Sequential training • Leverages Rebuffi et al. for learning new tasks over time without discarding the previously learned functions. 39
  • 39. Crowd Counting – Approach #3 40 Crowd Counting Cell Counting
  • 40. Crowd Counting – Approach #3 41 Penguin Counting Vehicle Counting
  • 41. A Summary of our work in … • Crowd Density Estimation • Level of crowd congestion observed at a given point in time • Crowd Counting • True number of people present in an image of a crowded scene • Crowd Segmentation • Locate different crowd characteristics in a scene • Crowd Behaviour Classification • Categorise the behaviour observed in acrowded scene • Anomalous Behaviour Detection • Identify behaviour which strays significantly from an established norm, typically learned from normal behaviour training data 42
  • 42. Surveillance Video – Roadmap • We can compute crowd counts, crowd segments, traffic volumes, very accurately • We’re not good at detecting events • We can learn behaviour patterns for an area, campus, stadium, city, from surveillance video – but by detecting simpler things, like crowd numbers; • We can use raw CCTV + raw audio as sensor input streams • We can determine regular behaviour using e.g. periodicity • We can deviations from normal, alert security, let them do their job 43