Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM

Solr and Machine Vision
Scott Cote and Trevor Grant
Lucidworks / IBM

ABOUT US
Trevor Grant
 PMC: Apache Mahout
Apache Streams
 IBM: Open Source Evangelist
“AI Engineer”
 
 @rawkintrevo
 www.rawkintrevo.org
Scott Cote
 Organizer: DFW Data Science
Mahout Fan
 Lucidworks: Senior Software Engineer
(Fusion Core Team)
 @scottccote @dfwdatascience

DEEP LEARNING: AN
OVERVIEW
 Deep learning is an exciting new technology with numerous applications, such as
detecting cats in pictures, creating nonsensical manuscripts, “completing” un finished
symphonies, magically returning your company to profitability after decades of poor
management through clever application of buzzwords, etc.

WHO’S INTERESTED IN
“DEEP LEARNING”

DEEP LEARNING: SLOW
TRAINING / PREDICTION
TIMES

ALTERNATIVELY- EXPENSIVE
(AND STILL SLOW)

IMAGE DETECTION
Haar Cascade Filters Deep Learning
Speed of training Days Months
Speed of prediction Ultrafast Not great
Accuracy Slightly lower Higher to MUCH higher (domain)
Type of recognition Well understood problem (faces) Poorly understood problem
(darkmatter)
Best Use-case •  You understand the domain
•  You can use multiple methods
•  You have limited resources:
•  Limited Time
•  Limited Compute Power
•  Limited $$$

DON’T HURT YOUR EYES
(IMAGE DETECTION PUN)

”FAST PREDICTION” IS
RELATIVE

REAL TIME VIDEO- OK, NOT
GOOD ENOUGH

LESS HATER-Y
 “Neural Nets are universal function approximates”
- Jake Manix, talk an hour ago.
 When milliseconds count- we can’t afford to approximate.
- Me, Now.

ANCIENT PARADIGM
Fast
(Training and Prediction Time)
Right
(Highest
accuracy)
Cheap
(In dollars
and
in hardware)
GPU
Deep Learning
Haar-Cascade
Filters
CPU
Deep
Learning

CASCADE FILTER OVERVIEW
 Scans for areas that match certain patterns.
 Historical Context of Cascade Filters

EIGENFACES (FACIAL
RECOGNITION) OVERVIEW
 Similar to Principal Component Analysis-
  We week reduce dimensionality of images (tens of thousands of individual pixels) to a composition of
“eigenfaces”
  A face (as a 250x250 image) is represented as a vector of length 62500 (250 x 250 = 62500 pixels)
  If we decompose into a combination of 130 Eigenfaces, we can represent a face with a vector of length
130.
  Advantages over “Deep Learning”
  Quicker to identify face
  Quicker to retrain
  Can instantaneously add new face to dataset
 History of Eigenfaces:

EIGENFACES (FACIAL
RECOGNITION)

EIGENFACES (PIXELWISE)
Squares represent pixels…

EIGENFACES (PIXELWISE)
22 85 54 123
56 187 92 91
111 204 103 245
8 247 155 212
239 87 99 84
Squares represent pixels…

EIGENFACES (FACIAL
RECOGNITION)
Matrix of Faces
ith Image
jth Pixel Position

EIGENFACES: SINGULAR
VALUE DECOMPOSITION
Matrix of FacesU Vx =

EIGENFACES: MATRIX V
Matrix of FacesU V (Eigenfaces)x =

EIGENFACES: MATRIX U
Matrix of FacesU Vx =
Linear combinations of Eigenfaces required to form the Nth Face
= 2.456 x - 7.2345 x + 0.4125 x

NEW FACES
y
V Transpose
(each column is
eigenface)

NEW FACES
y X
Simple Regression (OLS)
Ordinary Least Squares
β

RECAP
 Cascade Filters: Facial Detection (where/is there a ‘face’ in this picture)
 Eigen faces: Facial Recognition (WHO am I looking at?)
 Neural nets / deep learning- could do both in one pass- very very slow.

ACT 2 Real-time Facial Recognition

CREATING THE EIGENFACES:
COMPUTING
 Apache Spark- an In-Memory Map-Reduce Engine (has weak ML library, however we
won’t use).
 Apache Mahout- Provides Distributed Stochastic Singular Value Decomposition
method. (Also provides Mathematically expressive Scala DSL, and GPU/CPU
acceleration)
 Creating Eigen faces- Spark Job took 45 minutes on Desktop with 32GB RAM, 8CPUs
@ 3.9GHz, but also I was watching Rick And Morty.
 THIS JOB CAN BE GPU ACCELERATED BY CHANGING ONE DEPENDENCY.

CREATING THE EIGENFACES:
DATASET
 University of Mass. Faces in the Wild Dataset: 10k images of labeled faces from the
internet. Each image is 250x250 (62500 pixels)
10k Faces Dataset Matrix
(10,000 x 62500)
Each row corresponds to 1 image of a face
Each column corresponds to a given pixel position

APACHE MAHOUT ON
APACHE SPARK CALCULATES
EIGENFACES
10k Faces Dataset Matrix
Linear
Combos
Eigenfaces
x =

OPEN CV DETECTS FACES IN
VIDEO FRAME

321
4 5 6
7 8 9
1
2
3
4
5
6
7
8
9
Eigenfaces
Ordinary Least Squares
Linear Combination of Eigenfaces
MAHOUT DECOMPOSES
FACERECT INTO LINEAR
COMBINATION OF
EIGENFACES VECTOR

SEARCH SOLR FOR
MATCHING VECTOR

DOCUMENT
THE QUERY, RESPONSE, AND
DOCUMENTS
{
name_s: “Richard Hatch”,
e0_d : 1.512
e1_d : 5.125
e2_d : -15.1256
e3_d : 4.241
…
e129_d : 1.245
...
call_sign_s : “Apollo”
last_seen_dt : 2017-02-08T08:52:12
alias_s : “Tom Zarek”
...
}

{
e0_d : 1.512
e1_d : 5.125
e2_d : -15.1256
e3_d : 4.241
…
e129_d : 1.245
...
last_seen_dt : 2017-02-08T08:52:12
...
}
Query
DOCUMENTS
DOCUMENT

DOCUMENTS
Query
ALL Documents
Euclidean Distance
Ascending Order

DOCUMENTS
Response:
[ { “name_s” : “Apollo”, “calcDist” : 1256.254, “lastseen_pdt”: 1979-05-11T08:41:25},
{ “name_s” : “Tom Zarek”, “calcDist” : 1826.529, “lastseen_pdt”: 2017-02-07T08:41:25},
{ “name_s” : “Starbuck”, “calcDist” : 5826.529, “lastseen_pdt”: 2017-09-14T15:22:56},
{ “name_s” : “Caprica 6”, “calcDist” : 7119.525, “lastseen_pdt”: 2017-09-14T08:41:25},
…
]

RECOGNIZE OR NEW
ENTITY?
Response
Recognize?
Yes
No Add Person to Solr
Done

WHO DOES THE WORK?
Local
 Advantages:
 - Edge device can build use context clues
to make final decision
 Disadvantages:
 - Requires more hardware at edge to
“think”
On Solr
 Advantages:
 - Leverage advantages of Solr
 - Less hardware requirement on edge
 Disadvantages:
 - “Contextual clues” must be encoded in
query
Response
Recognize?

ACT 3 Building your own Cylons

DRONES ARE GETTING
CHEAP
 Drone 2-Pack
  $99.99
  Controlled via Smartphone
 FPV Camera
  $39.99 / ea
  Video over Wifi via RTSP
Video enabled drones for ~$90 each

CHALLENGES AND
OPPORTUNITIES
Challenge:
  Cascade Filters inconsistently frame face
  “Ghost Faces”
  Eigenfaces not robust to facial expressions,
changes in light, etc.

CHALLENGES AND
OPPORTUNITIES
Opportunity:
  Video gives us a lot more ”context clues”
than still frames.
  People don’t sporadically disappear and appear
  Someone seen recently is more likely to be present
than someone seen long ago.

OPENCV DETECTS FACES IN
A VIDEO FRAME

2 PROBLEMS
1.  The face is inconsistently detected (Eigenfaces is sensitive to this)
2.  Shadows, patterns on clothes, etc. cause “ghost faces” to be identified
sporradically.

SOLUTION: CLUSTERING/
FILTERING/WINDOWING
 Proposal: Cluster faces by location in frame. If less than N faces in cluster- remove
all faces in cluster (e.g. ghost clusters)
 Problem-2: People move around frame in time.
 Proposal-2: Break frames up into sliding window of M seconds.
 Problem-3: Clustering/machine learning can be somewhat computationally expensive
 Proposal-3: Canopy clustering (old, but still effective method- 1 pass clustering).

CANOPY CLUSTERING
 Create N Second Window
 Cluster Faces in Window
 Quick dirty clustering- but effective.
  First point is “center”
  All points within distance t2 are “in that cluster.
  If a point is not within t2 of any cluster- it becomes a new cluster center.

t2= max square width
OPENCV DETECTS FACES IN
VIDEO FRAME

t2= max square widthFirst rect – new cluster
Second Rect- within one width of first rect (same cluster)
Third Rect- within one width of first rect (same cluster)
Forth Rect- NOT within one width of first rect (new cluster)
Fifth Rect- within one width of first rect (same cluster)
Finally- any cluster with less than two entities in windows gets filtered out.
CANOPY CLUSTERING TO
REMOVE “GHOST” FACES

CLUSTERING BECAUSE WE
DON’T KNOW HOW MANY
TRUE FACES THERE ARE

SETTING THE ”LOOSE
DISTANCE"Half the width of largest rectangle is the “Loose Distance”

SETTING THE “TIGHT
DISTANCE”Half the width of largest rectangle is the “Loose Distance”

ADAPTIVE HYPER-
PARAMETERS
 A very simple machine learning algorithm adapts its self in real time to the input it is
receiving…
 A.I. Is a strong buzzword but...

DOCUMENT
A BETTER WAY TO SOLR
{
e0_d : 1.512
e1_d : 5.125
e2_d : -15.1256
e3_d : 4.241
…
e129_d : 1.245
...
last_seen_dt : 2017-02-08T08:52:12
...
}

LEVERAGE PAYLOAD
CAPABILITY OF TERM FIELDS
 New Query
 q=“*:*”
 &sort=dist(
 2
  ,payload(“e_dpf”,”e_00”)
  ...
  ,x_e0
  ,x_e1
  ...
  ,x_e130
 ) asc
 &rows=5
{
name_s : “Richard Hatch”
e_dpf:”e0|1.512 e1|1.512 … e129|
1.245”
…
,call_sign_s : “Apollo”
,last_seen_dt : “2017-02-08T08:52:12
,alias_s : “Tom Zarek”
…
}
Thank you Erik Hatcher (SOLR-1485 https://issues.apache.org/jira/browse/SOLR-1485)

BETTER SCALING
Cluster1 Cluster2 Cluster3
Cluster2a Cluster2b

WINDOWING
 A video is just a stream of Frames
 Apache Flink gives us a nice API for splitting/joining the stream, as well as creating
windows and applying functions to the windows. (Other bonuses too)

ENTER THE STREAM:
OPENCV DETECTS FACES

ENTER THE STREAM:
MAHOUT CANOPY CLUSTER
An n-m-second sliding window:
Every m seconds this window emits a set of clusters based on the last n seconds of data. For Exampe:
5-1, every 1 second a new set of ”face zones” based on faces detected the previous 5 seconds.

MAHOUT CANOPY CLUSTER
An n-m-second sliding window:
Every m seconds this window emits a set of clusters based on the last n seconds of data. For Exampe:
5-1, every 1 second a new set of ”face zones” based on faces detected the previous 5 seconds.
(Or 0.5 / 0.1 – Every 10th of a second based on last half second)

ENTER THE STREAM: A LAG
Here a small lag is introduced.

“APPLY THE CLUSTERS”
BASED ON FIT CANOPIES
Face Cluster 1 Face Cluster 1 Face Cluster 1 Face Cluster 1 Face Cluster 2 (only 1 image- Ghost)

STORE OUR MEMORIES IN
SOLR
METHOD1: AVERAGING
1.  Take all Face Rects in Cluster.
2.  Average them All together.
3.  Search Solr for this averaged image.
4.  If this “Average Face” matches a face in the cluster (within
some distance tolerance) we assign that name to every face
in the cluster- and write all faces to Solr as that person’s name.
5.  Otherwise- we create a new name, and write all faces to Solr under the new
Name.
6.  This really doesn’t work very well at all.
7.  ADVANTAGE: Minimize network traffic/SOLR taxation

STORE OUR MEMORIES IN
SOLR
METHOD2: “VOTING”
1.  Search EACH face
2.  Get list of names in results
3.  Assign points based on rank or distance
4.  Aggregate points across all rects, highest points “wins”- if winner has some
minimum threshold, assign that name.
5.  Otherwise- we create a new name, and write all faces to Solr under the new
Name.

PUNCHLINE:
 Second benefit of Eigen faces over ”deep learning” quickly add faces

WHY APACHE SOLR
 Capable of storing large amounts of data
 Scales to petabytes text oriented
 Numeric compute friendly
 Many ways to store different types of data

WHY APACHE MAHOUT
 Engine Agnostic (Spark/Flink/Standalone/RYO)
 Native acceleration on CPU/GPU/CUDA
 Possible to accelerate BLAS operations on ANY arch
(edge devices)
 Mathematically expressive Scala

WHY APACHE FLINK
 Sophisticated Windowing Functions
 Complex Event Library
 Scales linearly (1 drone vs Army of Drones)

TECHNICALLY “BORG-STYLE”
AI, NOT CYLONS
  A finer technical point for those familiar with the Cylons and the Borg
 “Hive Mind” Architecture

NEW HUMAN-0001OH, hai HUMAN-0001
LEARNING PROPAGATES
QUICKLY

SHAPE OF THINGS TO COME.
”Science Fiction” of 10 years ago, today is domain of
hobbyists
Demo presented here is “Science Fair” grade AI.
Vlad Putin’s recently talking about “it is undesirable for
anyone to monopolize AI”. (Yay Apache!)

DEMO Here’s a fun video while I set up

Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM

Semelhante a Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM (20)

Mais de Lucidworks

Mais de Lucidworks (20)

Último

Último (20)

Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM