Facial Emotion Detection on Children's Emotional Face
1. Project Presentation on
Sentiment Analysis on Human with special Concentration on Babies’
emotional face using Deep Learning.
Under Guidance of
DR. PARISMITA SARMA
(ASSISTANT PROFESSOR, Dept. Of IT )
Presented By-
Takrim Ul Islam Laskar
Roll No. 170302005
M.Tech 4th Sem
Dept. of Information Technology
Gauhati University
GUWAHATI,INDIA
JUNE 2019
2. Introduction.
• Emotion is a psychological and physiological state which is subject to
different conditions of mind.
• Broadly classifying happiness, sadness, anger, disgust, surprise and
fear are some of them.
• Pose, speech, facial expression, behaviour, etc convey emotions of
individuals. But face reflects the emotions of a human being most
precisely.
• The objective, therefore, is to detect the human emotions through
classification of facial expressions.
• The solution to the problem can be achieved through computer vision
and classification through Deep Learning.
3. Machine Learning.
• ML algorithms build a mathematical model of training data
• With this model, it makes predictions and decisions without explicitly
running certain commands.
• More the amount of training data, more is the accuracy.
• Machine Learning uses two types of techniques:
1. Supervised Learning .
2. Unsupervised Learning.
5. Supervised Learning
• Supervised Learning trains a model on labelled input and output data
so that it can predict future outputs.
• Supervised learning uses classification and regression techniques to
develop predictive models.
1. Classification techniques predict discrete responses. Classification models
classify input data into categories. Typical applications include emotion
detection, medical imaging, speech recognition, etc.
2. Regression techniques predict continuous responses like changes in
temperature or fluctuations in power demand. Typical applications include
electricity load forecasting, weather forecasting, etc.
6. Unsupervised Learning
• Unsupervised Learning finds hidden patterns or intrinsic structures in
data.
• It is used to draw inferences from datasets consisting of input data
without labelled responses.
• Clustering is the most common unsupervised learning technique.
7. Clustering
• It is used for exploratory data analysis to find hidden patterns or
groupings in data.
• Applications for cluster analysis include gene sequence analysis,
market research, and object recognition[16] .
8. Artificial Neural Network.
• It is a framework where different machine learning algorithms work
together to process complex data inputs.
• These systems learn to perform task based on instances extracted
from data rather than working with task-specific rules.
• The main objective is to develope a system which performs faster
with more accuracy than the traditional systems[19].
9. Artificial Neural Network.
• ANN acquire large collection of interconnected nodes/neurons.
• Nodes are simple processors which operate in parallel.
• Every neuron is connected with other neurons through a connection link.
• Each connection link is associated with a weight that has information about
the input signal.
• This is the most useful information for neurons to solve a particular
problem because the weight usually excites or inhibits the signal that is
being communicated.
• Each neuron has an internal state, which is called an activation signal.
Output signals, which are produced after combining the input signals and
activation rule, may be sent to other units.
10. Model of Articial Neural Network.
For the above general model of artificial neural network, the net input can be calculated as follows-
yin = x1.w1 + x2.w2 + x3.w3 + . . . xm.wm
i.e., Net input, yin = 𝑖
𝑚
xi . wi
11. Model of Articial Neural Network.
The output can be calculated by applying the activation function over the net input.
Y = F ( yin )
Output = function(Net input calculated)
12. Deep Learning.
• Deep Learning is a class of machine learning algorithms that use a
cascade of multiple layers of nonlinear processing units for feature
extraction, etc.
13. Deep Learning.
• Each successive layer uses the output from the previous layer as input.
• The most common techniques are Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Reinforcement Learning (RL)[21].
14. Convolutional Neural Network.
• A Convolutional Neural Network is a type of ANN.
• It uses convolutional layers to filter input information.
• Here the neurons in the layers are arranged in three dimensions
width, height and depth[7] .
• Convolutional Neural Network generally has 3 hidden layers:
1. Convolutional Layer.
2. Pooling Layer.
3. Fully Connected Layer.
15. Convolutional Layer.
Let's understand what happens in a 2D Convolutional Layer.
• The kernal/filter is moved from top-left to bottom-right along the input image.
• At each position of the kernal on the image, the corresponding values of the
image and the kernal gets multiplied.
• Then, all these values are added and stored in the corresponding output matrix.
16. Convolutional Layer.
• When Depth is more than 1, the same calculation is done for each of the
channels and they are added up and summed with the bias of the respective
kernal and thus the value for corresponding position of the output matrix is
obtained.
17. Pooling Layer.
• The main reason for using pooling layer is that it reduces the number
of parameters of the input tensor.
• It helps reduce overfitting, extract representative features from the
input tensor.
• It also helps reduce computation and increase efficiency.
• A model that has learned the noise instead of the signal or Feature is
considered overfit because it fits the training dataset but has poor fit
with new datasets.
19. Max Pooling.
• In Max Pooling, a n*n kernal is moved across the matrix.
• At each position the maximum value is considered and is assigned to
the corresponding position of the output matrix.
20. Average Pooling.
• In Average Pooling, a n*n kernal is moved across the matrix.
• At each position, the average of all the values is calculated and is assigned to
the corresponding position of the output matrix.
This is repeated in all the channels of the input tensor and hence the output
tensor is obtained.
21. Fully Connected Layer.
• Fully connected layer is feed forward neural network of few layers of network.
• The output of the final pooling layer or the convolutional layer is flattened and fed
into this layer.
• To flatten the 3D matrix, it converts all the values into a vector.
22. Flattening.
• This vector is then connected to some fully connected layers which perform the same
operation as an Artificial Neural Network(ANN).
• Once the vector is passed through the fully connected layer, Softmax Activation
Function is used by the final layer to get the probabilities of input in all the classes[1] .
23. Tools and Environment
• Tools
• Python
• openCV
• Tensorflow
• Keras
• Jupyter Notebook
• Camera
24. Tools and Environment
• Dependencies
• Kaggle Facial Expression Recognition (fer2013.csv) Dataset
The dataset consists of 48 X 48 pixel grayscale images of faces. The training sat consists of
28,709 images and the testing set consists of 3582 images and a total of 35888 images [4].
• Haar-cascade Classier (haarcascade frontalface alt.xml)
It is based on Haar-like features rather than on pixel intensity. It considers an adjacent
rectangular region. It sums up the pixel intensities in each region and then finds the
difference in the sum values to categorize the subsections or feature regions of an image.
25. Tools and Environment
• Environment
• Google Colaboratory.
Colaboratory is a free Jupyter notebook environment that requires no setup and runs
entirely in the cloud. It also provides us the option to select GPUs to run huge
computations.
• Google Drive.
The google drive cloud is used to store the training dataset.
26. Project Methodology
• The proposed system enable a
computer to detect the seven
universal emotions of human
being through classification of
facial expressions.
27. Loading image data.
• The image dataset has been uploaded first to the google drive.
• A jupyter notebook has been created in the google colab. with Python
3.
• The google colab. is authorised to access the data stored in google
drive.
The code in the Fig. is run to request the access.
29. Loading image data.
The Fig. shows the code for Loading and uncompressing of dataset
file from google drive.
30. Training and Testing CNN data model.
The dataset consisted of three
columns:
• Emotion- 0 – 6.
• Pixel - 48 X 48 image pixels
• Usage – Training, Public_test
Dataset
31. Training and Testing CNN data model.
The Fig. represents the column
usage and the label Training.
Dataset
32. Training and Testing CNN data model.
The Fig. represents the column
usage with labels Training and
Public Test.
Dataset
33. Reading and storing the data in csv file.
• All the data stored in fer2013.csv file are read and stored in a numpy
array ‘lines’.
• The number of instances and the length of instance is then
calculated.
• For each instance at a time the line is split into emotion class, image
data and usage as shown below.
emotion, img, usage = lines[i].split(“ , ”)
34. Reading and storing the data in csv file.
• Then the pixel values are split with reference to an empty space “ ”
and stored in numpy array “pixels”.
val = img.split(“ ”)
pixels = np.array(val, ‘float32’)
35. Reading and storing the data in csv file.
• For Training dataset the emotion class is appended to the y_train and
respective pixels are appended to x_train.
Y_train.append(emotion)
X_train.append(pixels)
• Similarly, for Testing dataset emotion is appended to y_test and pixels
to x_test.
Y_test.append(emotion)
X_test.append(pixels)
To normalize input between [0,1], the pixel values are divided by 255
36. Emotion Classes.
Classes Emotion
0 Angry
1 Disgust
2 Fear
3 Happy
4 Sad
5 Surprise
6 Neutral
• So, we got seven emotion classes 0-6.
• These seven classes in the model would be accessed by an array of length 7 to
create visualization of prediction as shown in table.
37. Epochs and Batches.
• Epochs - 25
• One Epoch means a complete process of passing the dataset through the
neural network.
• Batch size - 256
• Each epoch will divide the whole dataset into many batches with each batch
processing 256 instances.
38. Constructing Convolutional Neural Network
Structure.
• We constructed four layered sequential CNN model structure.
• It is sequential because the model needs to know what input shape it
is supposed to expect
• Rectified Linear Unit(ReLU) activation
• Returns 0 if it receives any negative input.
• for any positive value it returns back the same value.
• Softmax activation function
• Multiple outputs for 1 input array.
• This helps build a model which can classify more than 2 classes.
39. 1st Layer.
• The first layer added is 2D CNN layer:
• filters/kernals - 64
• kernal size - (5 , 5)
• activation function - relu
• input shape - 48 X 48 (grayscale)
• Then we added an Max pooling layer:
• pool size- (5,5)
• stride size (2,2).
40. 2nd Layer.
• The second layer added is 2D CNN layer:
• filters/kernals - 64
• kernal size - (3 , 3)
• activation function - relu
• Then we added an average pooling layer:
• pool size- (3 , 3)
• stride size (2,2).
41. 3rd Layer.
• The Third layer added is 2D CNN layer:
• filters/kernals - 128
• kernal size - (3 , 3)
• activation function - relu
• Then we added an average pooling layer:
• pool size- (3 , 3)
• stride size (2,2).
Then the matrix values are flattened to vectors.
42. 4th Layer.
• The Fourth layer added is Fully Connected layer:
• Dense layer-
• Nodes in first hidden layer - 128
• activation function - relu
• Dropout layer
• Dropout value – 0.2
• Finally a dense layer is added :
• No. of Nodes – 7 (No. of Classes.)
• Activation Function - softmax
43. Epochs and Batch Processing and Model
Generation.
• ImageDataGenerator Function has been initialized and data flow has
been defined with x train, y train and batch size and assigned to
variable 'train generator'.
gen = ImageDataGenerator()
train_generator = gen.flow(x_train, y_train, batch_size=batch_size)
• The model is then compiled to calculate the categorical
crossentropy(loss) and accuracy.
44. Epochs and Batch Processing and Model
Generation.
The model Generation is then started with train generator, batch size and
epochs as parameter.
model.fit_generator(train_generator,steps_per_epoch=batch_size, epochs=epochs)
Model was, then, saved on h5 data format.
45. HDF5 / H5 (Hierarchical Data Format)
• This is a file format designed to organize and store huge amount of
data.
• It contains two major type of objects dataset and group:
1. Datasets are multidimensional array of homogeneous type.
2. Groups are the container structures which are capable of storing dataset
and other groups.
46. Training time on GPU and CPU.
GPU CPUs Time ( hh : mm )
0 56 03 : 15
1 56 00 : 04
1 1 00 : 03
When we trained the image classifier, we tried different
configurations as mentioned in the table.
47. Evaluation.
• For Evaluation of training and testing loss and accuracy we run
the following commands.
Train_score = model.evaluate(x_train, y_train, verbose=0)
Test_score = model.evaluate(x_test, y_test, verbose=0)
• Here test score is array of length 2 and stores the testing loss
and accuracy respectively.
48. Confusion Matrix / Error Matrix.
• Confusion matrix helps visualize the performance of an algorithm.
• Each row represents the instances in an actual class.
• Each column represents the instances in a predicted class.
The confusion matrix function is provided by the Scikit-Learn Library.
from sklearn.metrics import classication report, confusion matrix
The same index column and row are different mean, system is not confusing between the actual
and predicted class.
49. Monitoring Testset Results.
• For visualizing and monitoring the test set results, x_test (image
pixels) has been passed through predict function.
predictions = model.predict(x_test)
Pixel value from the 20th instance is retrieved and
image has been reconstructed and plotted along
with the percentage of accuracy graph.
50. Implementation of Emotion detector/
Sentiment Analyzer.
• At first the trained model is imported using Keras model Loader with
the following commands.
from keras.models import load_model
model = load model('model.h5')
51. Capturing Image from Webcam.
• For capturing image from the inbuilt webcam we used a code snippet provided by
google colab. itself. The provided code is displayed below in the Fig. .
52. Output of the provided Code is displayed in
the Fig.
53. Face detection from Source Image.
• Haar-Cascade frontal face Detector was used for face detection in
the source image.
• It is based on Haar-like features rather than on pixel intensity.
• At first the pre-trained cascade file ‘haarcascade_frontal_face alt.xml’
is loaded and passed trough the openCV cascade classier and hence
the cascade classifier is initialized.
facedata = “haarcascade_frontalface_alt.xml”
cascade = cv2.CascadeClassier(facedata)
54. Face detection from Source Image.
• Then the image is passed through multiscale detector and hence all the
faces in the image are detected, cropped and stored. The cropped images
are then converted to grayscale for emotion detection.
faces = cascade.detectMultiScale(image_file)
55. Defining the emotion analysis function.
• The emotional analysis function is passed the predicted data of the
image.
• An array of emotions is created where each emotion with its index
number corresponds to emotion class number.
objects = (‘angry’, ‘disgust’, ‘fear’, ‘happy’, ‘sad’, ‘surprise’, ‘neutral’)
56. Defining the emotion analysis function.
Emotion Classes(Predicted) Emotions Emotion index in Array
0 Angry 0
1 Disgust 1
2 Fear 2
3 Happy 3
4 Sad 4
5 Surprise 5
6 Neutral 6
The table for mapping of emotion, class and the emotion index in the array is displayed in
the table .
57. Prediction of emotions.
• Now the real-time or stored image is loaded in grayscale colour
scheme and with 48 X 48 resolution.
• This image is converted to an linear array containing all the pixel
values.
• Each pixel is divided by 255 for normalization.
58. Prediction of emotions.
• This array is passed through the prediction function.
• Result stored in an array with length 7.
• This array is sent to emotion analysis function for plotting on bar
diagram.
60. Accuracy when tested with random Images.
• The accuracy of the model when tested with 20 images for each
emotion and the average was taken which are mentioned in the table.
Emotions Percentage of Accuracy ( % )
Angry 69.5
Disgust 50.0
Fear 59.8
Happy 76.8
Sad 59.3
Surprise 84.7
Neutral 80.2
TOTAL 68.6
61. Accuracy when tested with random Images.
• Reason of less accuracy of disgust, fear and sadness is their similarity with other
Expressions.
Emotion Similarity
Disgust Anger, sadness
Fear Surprise.
Sadness. Happy, fear, surprise.
62. Output of test with random image for each
emotion.
Angry Emotion.
63. Output of test with random image for each
emotion.(Cont.)
Disgust.
64. Output of test with random image for each
emotion.(Cont.)
Fear Emotion.
65. Output of test with random image for each
emotion.(Cont.)
HappyEmotion.
66. Output of test with random image for each
emotion.(Cont.)
Sad Emotion.
67. Output of test with random image for each
emotion.(Cont.)
Surprise Emotion.
68. Output of test with random image for each
emotion.(Cont.)
Neutral Emotion.
69. Result and Discussion
• In the system, we used kaggle fer2013 dataset consisting 35,888
image data where 28,709 instances are for training 3,582 instances
are for testing.
• The training with Convolutional Neural Network(CNN) gave us the
image classifiers/trained model.
• Using this model, we calculate the percentage of resemblance of each
of the seven universal emotions from the input image.
• Overall accuracy was 68.6% when tested with 20 images.
• The emotions disgust, fear and sad produced accuracy less than 60%
due to the similarity with other expressions.
70. Conclusion and future work
• In presence of GPU, more the CPUs, more is the Training Time.
• In the future, the accuracy can further be increased by training with a
dataset with even more instances specially in the case of emotions
disgust, fear, sad.
71. Bibliography
1) Arunava. (2018) “An introduction to convolutional neural networks.” [Online]. Available:
https://towardsdatascience.com/convolutional-neural-network-17fb77e76c05
2) A. Awasthi, “Facial emotion recognition using deep learning,” Project Report submitted to Indian Institute of Technology
Kanpur, 2011.
3) D. R. Frischholz. (2018) Face detection algorithms techniques. [Online]. Available: https://facedetection.com/algorithms/
4) I. Goodfellow, D. Erhan, P.-L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C.
Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J.
Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio, Challenges in representation learning: A report on three machine
learning contests," 2013. [Online]. Available: http://arxiv.org/abs/1307.0414.
5) G. Hemalatha and C. Sumathi, A study of techniques for facial detection and expression classication,“ International Journal of
Computer Science and Engineering Survey, vol. 5, no. 2, p. 27, 2014.
6) S. K. A. Kamarol, M. H. Jaward, J. Parkkinen, and R. Parthiban, Spatiotemporal feature extraction for facial expression
recognition," IET Image Processing, vol. 10, no. 7, pp. 534{541, 2016.
7) E. Melnikov. (2018) Convolutional neural network (cnn). [Online]. Available:
https://developer.nvidia.com/discover/convolutional-neural-network
8) openCV team. (2018) Opencv documentation. [Online]. Available: https://opencv.org/
9) M. Pantic, Facial expression recognition," Imperial College London, London, UK; University of Twente, AE Enschede, The
Netherlands, 2008.
10) A. Pascu, B. A. Intelligence, and R. King, Facial expression recognition system," University of Manchester, 2015.
72. Bibliography
11) H. Qin, J. Yan, X. Li, and X. Hu, “Joint training of cascaded cnn for face detection,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2016, pp. 3456-3465.
12) M. Rabiei and A. Gasparetto, “System and method for recognizing human emotion state based on analysis of speech and facial
feature extraction; applications to human-robot interaction,” in 2016 4th International Conference on Robotics and Mechatronics
(ICROM). IEEE, 2016, pp. 266-271.
13) A. Savoiu and J. Wong, “Recognizing facial expressions using deep learning,” Stanford University, 2017.
14) C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on local binary patterns: A comprehensive study,” Image
and vision Computing, vol. 27, no. 6, pp. 803-816, 2009.
15) M. H. Siddiqi, R. Ali, M. Idris, A. M. Khan, E. S. Kim, M. C. Whang, and S. Lee, “Human facial expression recognition using curvelet
feature extraction and normalized mutual information feature selection,” Multimedia Tools and Applications, vol. 75, no. 2, pp. 935-
959, 2016.
16) M. team. (2016) Machine learning. [Online]. Available: https://in.mathworks.com/discovery/machine learning.html
17) T. team. (2015) Articial neural network - basic concepts. [Online]. Available: https://www.tutorialspoint.com/articial neural
network/articial neural network basic concepts.htm
18) W. N. Widanagamaachchi, Facial emotion recognition with a neural network approach," University of Colombo, 2009.
19) Wikipedia. (2016) Articial neural network. [Online]. Available: https://en.wikipedia.org/wiki/Articial neural network.
20) Wikipedia. (2016) Machine learning. [Online]. Available: https://en.wikipedia.org/wiki/Machine learning
21) Wikipedia.. (2017) Deep learning. [Online]. Available: https://en.wikipedia.org/wiki/Deep learning
22) N. Zeng, H. Zhang, B. Song, W. Liu, Y. Li, and A. M. Dobaie, “Facial expression recognition via learning deep sparse autoencoders,”
Neurocomputing, vol. 273, pp. 643{649, 2018.