This document discusses using machine vision techniques like Haar cascade filters and eigenfaces for real-time facial recognition and detection. It proposes using OpenCV to detect faces in video frames, clustering the detected faces to remove "ghost" faces, representing each face as a vector of eigenface coefficients, and searching Solr to identify faces or add new identities. It also discusses challenges like inconsistent face detection and proposes solutions like adaptive clustering parameters and windowing video frames to add context.
4. DEEP LEARNING: AN
OVERVIEW
âŻDeep learning is an exciting new technology with numerous applications, such as
detecting cats in pictures, creating nonsensical manuscripts, âcompletingâ un finished
symphonies, magically returning your company to profitability after decades of poor
management through clever application of buzzwords, etc.
11. IMAGE DETECTION
Haar Cascade Filters Deep Learning
Speed of training Days Months
Speed of prediction Ultrafast Not great
Accuracy Slightly lower Higher to MUCH higher (domain)
Type of recognition Well understood problem (faces) Poorly understood problem
(darkmatter)
Best Use-case âąâŻ You understand the domain
âąâŻ You can use multiple methods
âąâŻ You have limited resources:
âąâŻ Limited Time
âąâŻ Limited Compute Power
âąâŻ Limited $$$
16. LESS HATER-Y
âŻâNeural Nets are universal function approximatesâ
- Jake Manix, talk an hour ago.
âŻWhen milliseconds count- we canât afford to approximate.
- Me, Now.
17. ANCIENT PARADIGM
Fast
(Training and Prediction Time)
Right
(Highest
accuracy)
Cheap
(In dollars
and
in hardware)
GPU
Deep Learning
Haar-Cascade
Filters
CPU
Deep
Learning
22. EIGENFACES (FACIAL
RECOGNITION) OVERVIEW
âŻSimilar to Principal Component Analysis-
Âï⯠We week reduce dimensionality of images (tens of thousands of individual pixels) to a composition of
âeigenfacesâ
Âï⯠A face (as a 250x250 image) is represented as a vector of length 62500 (250 x 250 = 62500 pixels)
Âï⯠If we decompose into a combination of 130 Eigenfaces, we can represent a face with a vector of length
130.
Âï⯠Advantages over âDeep Learningâ
Âï⯠Quicker to identify face
Âï⯠Quicker to retrain
Âï⯠Can instantaneously add new face to dataset
âŻHistory of Eigenfaces:
39. RECAP
âŻCascade Filters: Facial Detection (where/is there a âfaceâ in this picture)
âŻEigen faces: Facial Recognition (WHO am I looking at?)
âŻNeural nets / deep learning- could do both in one pass- very very slow.
41. CREATING THE EIGENFACES:
COMPUTING
âŻApache Spark- an In-Memory Map-Reduce Engine (has weak ML library, however we
wonât use).
âŻApache Mahout- Provides Distributed Stochastic Singular Value Decomposition
method. (Also provides Mathematically expressive Scala DSL, and GPU/CPU
acceleration)
âŻCreating Eigen faces- Spark Job took 45 minutes on Desktop with 32GB RAM, 8CPUs
@ 3.9GHz, but also I was watching Rick And Morty.
âŻTHIS JOB CAN BE GPU ACCELERATED BY CHANGING ONE DEPENDENCY.
42. CREATING THE EIGENFACES:
DATASET
âŻUniversity of Mass. Faces in the Wild Dataset: 10k images of labeled faces from the
internet. Each image is 250x250 (62500 pixels)
10k Faces Dataset Matrix
(10,000 x 62500)
Each row corresponds to 1 image of a face
Each column corresponds to a given pixel position
43. APACHE MAHOUT ON
APACHE SPARK CALCULATES
EIGENFACES
10k Faces Dataset Matrix
Linear
Combos
Eigenfaces
x =
53. WHO DOES THE WORK?
Local
âŻAdvantages:
âŻ- Edge device can build use context clues
to make final decision
âŻDisadvantages:
âŻ- Requires more hardware at edge to
âthinkâ
On Solr
âŻAdvantages:
âŻ- Leverage advantages of Solr
âŻ- Less hardware requirement on edge
âŻDisadvantages:
âŻ- âContextual cluesâ must be encoded in
query
Response
Recognize?
55. DRONES ARE GETTING
CHEAP
âŻDrone 2-Pack
Âï⯠$99.99
Âï⯠Controlled via Smartphone
âŻFPV Camera
Âï⯠$39.99 / ea
Âï⯠Video over Wifi via RTSP
Video enabled drones for ~$90 each
57. CHALLENGES AND
OPPORTUNITIES
Opportunity:
Âï⯠Video gives us a lot more âcontext cluesâ
than still frames.
Âï⯠People donât sporadically disappear and appear
Âï⯠Someone seen recently is more likely to be present
than someone seen long ago.
64. 2 PROBLEMS
1.⯠The face is inconsistently detected (Eigenfaces is sensitive to this)
2.⯠Shadows, patterns on clothes, etc. cause âghost facesâ to be identified
sporradically.
66. SOLUTION: CLUSTERING/
FILTERING/WINDOWING
âŻProposal: Cluster faces by location in frame. If less than N faces in cluster- remove
all faces in cluster (e.g. ghost clusters)
âŻProblem-2: People move around frame in time.
âŻProposal-2: Break frames up into sliding window of M seconds.
âŻProblem-3: Clustering/machine learning can be somewhat computationally expensive
âŻProposal-3: Canopy clustering (old, but still effective method- 1 pass clustering).
67. CANOPY CLUSTERING
âŻCreate N Second Window
âŻCluster Faces in Window
âŻQuick dirty clustering- but effective.
Âï⯠First point is âcenterâ
Âï⯠All points within distance t2 are âin that cluster.
Âï⯠If a point is not within t2 of any cluster- it becomes a new cluster center.
68. t2= max square width
OPENCV DETECTS FACES IN
VIDEO FRAME
69. t2= max square widthFirst rect â new cluster
Second Rect- within one width of first rect (same cluster)
Third Rect- within one width of first rect (same cluster)
Forth Rect- NOT within one width of first rect (new cluster)
Fifth Rect- within one width of first rect (same cluster)
Finally- any cluster with less than two entities in windows gets filtered out.
CANOPY CLUSTERING TO
REMOVE âGHOSTâ FACES
73. ADAPTIVE HYPER-
PARAMETERS
âŻA very simple machine learning algorithm adapts its self in real time to the input it is
receivingâŠ
âŻA.I. Is a strong buzzword but...
80. WINDOWING
âŻA video is just a stream of Frames
âŻApache Flink gives us a nice API for splitting/joining the stream, as well as creating
windows and applying functions to the windows. (Other bonuses too)
82. ENTER THE STREAM:
MAHOUT CANOPY CLUSTER
An n-m-second sliding window:
Every m seconds this window emits a set of clusters based on the last n seconds of data. For Exampe:
5-1, every 1 second a new set of âface zonesâ based on faces detected the previous 5 seconds.
83. MAHOUT CANOPY CLUSTER
An n-m-second sliding window:
Every m seconds this window emits a set of clusters based on the last n seconds of data. For Exampe:
5-1, every 1 second a new set of âface zonesâ based on faces detected the previous 5 seconds.
(Or 0.5 / 0.1 â Every 10th of a second based on last half second)
87. STORE OUR MEMORIES IN
SOLR
METHOD1: AVERAGING
1.⯠Take all Face Rects in Cluster.
2.⯠Average them All together.
3.⯠Search Solr for this averaged image.
4.⯠If this âAverage Faceâ matches a face in the cluster (within
some distance tolerance) we assign that name to every face
in the cluster- and write all faces to Solr as that personâs name.
5.⯠Otherwise- we create a new name, and write all faces to Solr under the new
Name.
6.⯠This really doesnât work very well at all.
7.⯠ADVANTAGE: Minimize network traffic/SOLR taxation
88. STORE OUR MEMORIES IN
SOLR
METHOD2: âVOTINGâ
1.⯠Search EACH face
2.⯠Get list of names in results
3.⯠Assign points based on rank or distance
4.⯠Aggregate points across all rects, highest points âwinsâ- if winner has some
minimum threshold, assign that name.
5.⯠Otherwise- we create a new name, and write all faces to Solr under the new
Name.
90. WHY APACHE SOLR
ÂïâŻCapable of storing large amounts of data
ÂïâŻScales to petabytes text orientedÂ
ÂïâŻNumeric compute friendly
ÂïâŻMany ways to store different types of data
91. WHY APACHE MAHOUT
ÂïâŻEngine Agnostic (Spark/Flink/Standalone/RYO)
ÂïâŻNative acceleration on CPU/GPU/CUDA
ÂïâŻPossible to accelerate BLAS operations on ANY arch
(edge devices)Â
ÂïâŻMathematically expressive Scala
93. TECHNICALLY âBORG-STYLEâ
AI, NOT CYLONS
⯠A finer technical point for those familiar with the Cylons and the Borg
âŻâHive Mindâ Architecture
95. SHAPE OF THINGS TO COME.
âScience Fictionâ of 10 years ago, today is domain of
hobbyists
Demo presented here is âScience Fairâ grade AI.
Vlad Putinâs recently talking about âit is undesirable for
anyone to monopolize AIâ. (Yay Apache!)