4. 4
introduction
What is face alignment?
• Face alignment is to extract facial feature points :
• , and from the given image
Eyebrow
Eye
Nose
Mouth
Chin
* “The POSTECH Face Database (PF07) and Performance Evaluation”, FG 2008
5. 5
introduction
Why is it important?
• Face alignment is pre-requisite for many face-related
problem.
Angry Happy
-25° 0° +25°
Surprise Neutral
Face Recognition Face Expression Recognition Head Pose Estimation
8. 8
Previous work
Previous work
• Two approaches
• 1. Discriminative approach
• Active Shape Model
• The shape parameters are iteratively updated by locally finding the best
nearby match for each feature point.
• 2. Generative approach
• Active Appearance Model
• The shape parameters are iteratively updated by minimizing the error
between appearance instance and input image.
9. 9
Previous work
Previous work
• 1. Discriminative approach
Constrained Local Model[1] Bayesian Tangent Shape Model[2]
• Feature detector : Linear SVM • Feature detector : gradient along normal vector
• Alignment algorithm : Mean-shifts • Alignment algorithm : Bayesian Inference
• They assume that all the feature points are visible.
• By the wrong detected feature points, alignment fails.
[1] Jason et al., “Face Alignment through Subspace Constrained Mean-Shifts”, ICCV 2009
[2] Yi et al., “Bayesian Tangent Shape Model:Estimating Shape and Pose Parameters via Bayesian Inference”, CVPR 2003
10. 10
Previous work
Previous work
• 2. Generative approach
Boosted Appearance Model[3] Fourier Active Appearance Model[4]
• Appearance model : Haar-like feature • Appearance model : Fourier transformed
and boosting. appearance
• Weak classifier : discriminate aligned • Alignment algorithm : gradient descent
images from not-aligned images.
• Due to high dimensional solution space, it has large number of
local minimums.
• They need good initialization by eye detection.
[3] Xiaoming Liu, “Generic Face Alignment using Boosted Appearance Model”, CVPR 2007
[4] Rajitha, et al., “Fourier Active Appearance Models”, ICCV 2011
12. 12
Proposed method
Motivation
• We follow discriminative approach.
• Determine whether a feature point is visible or not.
• Only visible feature points are involved alignment step.
• Invisible feature points are estimated by visible feature points using partial
inference (PI) algorithm.
• Using the multiple shape models, we solve pose problem.
We propose pose and occlusion robust face alignment !
Visible
Invisible
13. 13
Proposed method
Shape Representation
• Point Distribution Model
• The non-rigid shape :
• is represented by linear combination of shape bases with the
mean shape as
: mean shape associated to
: eigenvectors associated to
: shape parameter
: scale
: rotation
: translation(x, y)
14. 14
Proposed method
Formulation
• Shape Model with parameter, p ={s, R, q, t}
• Energy function
denotes whether the is aligned(visible) or not,
is the number of local features.
15. 15
Proposed method
Multiple Shape Models
• To cover various pose and expression, we build multiple
shape models.
• We build eigenvectors for nth pose, mth expression,
• Given n and m, shape is
16. 16
Proposed method
Formulation with multiple shape models
• Energy function
17. 17
Proposed method
Algorithm Overview
Model Hypotheses
[Input] Evaluation
Local Feature Detection
[Output]
[Hypothesis-and-test]
Hypothesizing
Transformation Parameters
Face Hypothesizing
Detection Shape Parameters
18. 18
Proposed method
Local Feature Detection
Model Hypotheses
[Input] Evaluation
Local Feature Detection
[Output]
[Hypothesis-and-test]
Hypothesizing
Transformation Parameters
Face Hypothesizing
Detection Shape Parameters
19. 19
Proposed method
Local feature detection
Local Feature Detection
• Goal
Detect feature point candidates with Gaussian Model!
• Based on MCT+Adaboost algorithm [5],
• We propose Hierarchical MCT to increase detection
performance.
[5] Jun, and Kim, “Robust Real-Time Face Detection Using Face Certainty Map”, ICB, 2007
20. 20
Proposed method
Local feature detection
Feature Descriptor
• Modified Census Transform (MCT)
I1 I2 I3 B1 B2 B3
9
I4 I5 I6 B4 B5 B6 C B x * 2x
x1
I7 I8 I9 B7 B8 B9
1 9
M Ix
9 x 1
Bx 1 if Ix M
B x 0 otherwise
102 105 118 0 0 0
120 111 101 1 0 0 011100000 2
224
123 119 109 1 1 0
21. 21
Proposed method
Local feature detection
Feature Descriptor
• Modified Census Transform (MCT)
• Transformed result
Gray image MCT
• MCT is point feature
• Represents local intensity’s difference
• Very sensitive to noise
22. 22
Proposed method
Local feature detection
Feature Descriptor
We propose Hierarchical MCT
• Regional feature
• To represent regional difference
• Robust to noise
I1 I2 I3
9
I4 I5 I6 C B x * 2x
x 1
Partition Average MCT
I7 I8 I9
23. 23
Proposed method
Local feature detection
Training procedure
• Hierarchical MCT + Adaboost
35
25
Adaboost
Training
15
35
5
Image pyramid Concatenated
Input image
By Integral Image
MCT
vector
24. 24
Proposed method
Local feature detection
Feature Response
• Feature response by Adaboost with different feature
descriptor
Training
Image
Test
Image
Conventional Conventional Hierarchical Hierarchical
LBP MCT LBP MCT
25. 25
Proposed method
Local feature detection
Process of local feature detection
[Input] Hierarchical Adaboost Regressed
Search region
MCT Response Response
How to obtain feature point candidates?
26. 26
Proposed method
Local feature detection
Representation of Feature Response
• How to obtain feature point candidates?
• Local maximum points in candidate search region
arg max x y, y , and px 0, x is center of
x
Segmented
[Input] Response region
27. 27
Proposed method
Local feature detection
Representation of Feature Response
• How to obtain feature point candidates?
• We compute distribution of segmented region through convex
quadratic function
is kth segmented region in ith feature point.
is the centroid of
is the inverted feature response function.
• We obtain and : feature candidate’s distribution and centroid.
• Independent Gaussian distribution
Kronecker delta function which is visible.
28. 28
Proposed method
Local feature detection
Feature clustering
• Mouth corner’s appearance varies according to facial
expression according.
• The detection performance degrades when only one detector is
used to train for all the mouth shapes and appearances.
Neutral Smile Surprise
29. 29
Proposed method
Local feature detection
Feature clustering
• Train each detector with each clustered feature
• Run detectors and combine results
30. 30
Proposed method
Local feature detection
Local feature detection
…..
..…
[Candidates
[Input] [Search region] [Adaboost [output of detection]
with Gaussian]
Response]
31. 31
Proposed method
Hypothesizing Transformation Parameters
Model Hypotheses
[Input] Evaluation
Local Feature Detection
[Output]
[Hypothesis-and-test]
Hypothesizing
Transformation Parameters
Face Hypothesizing
Detection Shape Parameters
32. 32
Proposed method
Hypothesizing Hypo. trans. param.
• Goal
Find a best combination of the
local feature point candidates
which represents input image well.
[Feature point candidates]
• Assumption for occlusion
• We assume that at least half of feature points are not occluded.
• Let be N is total number of features points.
• N/2 feature points can be assumed to be visible ones.
33. 33
Proposed method
Hypothesizing Hypo. trans. param.
• Coarse-to-fine approach
– The hypothesis space of visibility of feature p
oints is HUGE.
– Partial Inference (PI) Algorithm
• 1. Transformation parameters (s, R, t) are estimate
d by RANSAC.
• 2. Shape parameters (q) are estimated, also transfo
rmation parameters are updated by RANSAC
35. 35
Proposed method
Hypothesizing Shape Parameters
Model Hypotheses
[Input] Evaluation
Local Feature Detection
[Output]
[Hypothesis-and-test]
Hypothesizing
Transformation Parameters
Face Hypothesizing
Detection Shape Parameters
36. 36
Proposed method
Hypothesizing Shape Parameters
• From the selected feature points , we calculate parameters p
in closed form by
• Visibility indicator
• to and to are selected candidate’s Gaussian parameters.
38. 38
Proposed method
Hypothesizing for all pose and expression
• Run two hypothesizing steps for all shape mod
els (of face pose and expression)
39. 39
Proposed method
Model Hypothesis Evaluation
Model Hypotheses
[Input] Evaluation
Local Feature Detection
[Output]
[Hypothesis-and-test]
Hypothesizing
Transformation Parameters
Face Hypothesizing
Detection Shape Parameters
40. 40
Proposed method
Model Hypotheses Evaluation
• We should select best pose and expression from all the
hypotheses.
• Hypothesis error is mean error of inliers(E) over number of
inliers(v).
Num. of Inliers 54 52 43 40
Error of inliers 2.9755 3.23 3.37 2.95
43. 43
Experimental results
Training database
• CMU Multi-PIE [7]
• Various pose, expression and illumination
• We used 10,948 images among 750,000 images
• 5 Pose models
• 0°, 15°~30°, 30°~45° (70 feature points)
• 60°~75°, and 75°~90° (40 feature points)
• 2 Expression models
• Neutral and smile
• surprise
[7] Ralph et al., “Guide to the CMU Multi-pie database”, Technical report, CMU, 2007
44. 44
Experimental results
Test database
• ARDB [8]
• Occlusion (Sunglasses, and scarf)
• CMU Multi-PIE
• Various pose, expression, illumination
• For artificial occlusion
• LFPW(Labeled Face Parts in the Wild) [9]
• Various pose, expression, illumination, and partial occlusion.
• 29 feature points
• To compare our algorithm with other state-of-the art one
AR DB LFPW
[8] A.M. Martinez and R. Benavente. The AR Face Database. CVC Technical Report #24, June 1998
[9] P. Belhumeur, et al., “Localizing parts of faces using a concensus of exemplars”, IEEE CVPR, 2011
45. 45
Experimental results
Alignment Accuracy
• Normalized error
• Euclidean distance between aligned feature and ground truth with
respect to face size.
• If Normalized error is 0.01 with 100 pixel size face,
• distance between aligned feature and ground truth is only one pixel.
46. 46
Experimental results
AR database
• Test result
• 60 images
47. 47
Experimental results
AR database
• Normalized error for • Cumulative error
occlusion type
Normalized mean error for occlusion type
Non occlusion 0.0226
Scarf 0.0258
Sunglasses 0.0338
48. 48
Experimental results
CMU Multi-PIE Database
• Test result
• Test for pose
• 321 images
49. 49
Experimental results
CMU Multi-PIE Database
• Normalized mean error • Cumulative error
for pose
Normalized mean error for pose
*60°~90° shows a little poor than 0°~45°.
0° 0.0263 60° 0.0352
Since large portion of the facial features
15° 0.0253 75° 0.0336 are covered by hair, the total number of
30° 0.0273 90° 0.0368 visible feature points detected is too small
to hallucinate correct facial shape.
45° 0.0267
50. 50
Experimental results
CMU Multi-PIE Database
• Test for artificial occlusion
• Face area is divided by 5-by-5.
• Among 25 regions, 1 to 15 regions are selected randomly and filled by
black.
• From 8 of occluded regions, the fraction of occlusion starts to be over 50%
of feature points.
• 2,100 images
51. 51
Experimental results
CMU Multi-PIE Database
• Test result
52. 52
Experimental results
CMU Multi-PIE Database
• Normalized error for pose
• For the profile(60°~90°) view, even small occlusion affects the alignment
badly because there are fewer strong features like eyes, mouth, and
nostrils.
• However, with respect to the mean error, the proposed method shows
stable alignment up to 7 degree of occlusion which is nearly 50% of
occlusion.
53. 53
Experimental results
LFPW database
• Mean error over inter-ocular distance for 21 feature points
• 240 of 300 images
* P. Belhumeur, et al., “Localizing parts of faces using a concensus of exemplars”, IEEE CVPR, 2011
55. 55
Conclusion
• We proposed pose and occlusion robust face alignment
method.
• To solve pose problem, we used multiple shape models.
• To solve occlusion problem, we proposed partial
inference (PI) algorithm.
• We explicitly determine which part is occluded.
• We proposed Hierarchical MCT+Adaboost for local
feature detector to improve detection performance.
57. 57
Future work
• We combine generative approach (Active Appearance
Model) with discriminative approach (local feature detector).
• Current facial feature tracking
• AAM with temporal matching, template update, and motion
estimation
58. 58
Future work
• Problem in facial feature tracking
• Drift problem
Iterative Update
Appearance
Error
arg minE AAM I n , A,p, α
[Input] [Output]
p,α
-
Update
parameters
p p p
α α α
x x0 pi si
Condition
59. 59
Future work
• By local feature detection result,
• we can constrain the aligned feature points by AAM to the local
feature detector.
60. 60
Future work
[Input In]
Iterative Update
[Point constraint]
Feature point Appearance
selection Error
arg minE AAM I n , A,p, α
p,α
-
Local feature [Output]
detector Point Error Update
parameters
x1 y1 p p p
E pts x2 y2 α α α
x x0 pi si
…
xn yn
Condition
61. 61
Future work
• By local feature detection result,
• We can make validation matrix of AAM for robust fitting.
• After alignment,
• We run feature detector on the aligned feature points.
• We determine whether each point is occluded or not.
• Based on feature-occlusion information, we make validation matrix
of AAM for robust fitting.
• Validation matrix is used for robust AAM from the next input image.
62. 62
Future work
Validation
[Input In] Matrix
Iterative Update
[Point constraint]
Feature point Appearance
selection Error
arg minE AAM I n , A,p, α Occlusion
p,α
Decision
x1-pos.
- x2-neg.
…
xn-pos.
Local feature
detector Point Error Update
parameters
x1 y1 p p p [Output]
E pts x2 y2 α α α
x x0 pi si
…
xn yn
Condition
63. 63
Future work
Validation
[Input In+1] Matrix
Iterative Update
[Point constraint]
Feature point Robust
selection App. Error
arg minE AAM I n , A,p, α Occlusion
p,α Decision
* - x1-pos.
x2-neg.
…
xn-pos.
Local feature
detector Point Error Update
parameters
x1 y1 p p p [Output]
E pts x2 y2 α α α
x x0 pi si
…
xn yn
Condition