SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012



              Speech based Emotion Recognition with
                     Gaussian Mixture Model

                    Nitin Thapliyal1                                           Gargi Amoli2
                Assistant Professor UIT, Dehradun                  Assistant Professor DIT, Dehradun
                  thapliyal.nitin@gmail.com                             gargi.amoli@gmail.com



                                                                  about the message, speaker, language and emotions .Speech
                                                                   is produced from a time varying vocal tract system excited by
     Abstract— This paper is mainly concerned with speech based
emotion recognition. The main work is concerned with               a time varying excitation source. Emotion on other side is an
Gaussian mixture model (GMM model) which allows training           individual mental state that arises spontaneously rather than
the desired data set from the databases. GMM are known to          through conscious effort. There are various kinds of emotions
capture distribution of data point from the input feature space,   which are present in a speech. Some are ANGER,
therefore GMM are suitable for developing emotion recognition      COMPASSION, DISGUST, FEAR, HAPPY, NEUTRAL,
model when large number of feature vector is available. Given a    SARCASTIC and SURPRISE. Recognition of Emotions
set of inputs, GMM refines the weights of each distribution
                                                                   from Speech-Speech features may be basically extracted
through expectation-maximization algorithm. Once a model is
                                                                   from excitation source, vocal tract or prosodic points of view
generated, conditional probabilities can be computed for test
                                                                   to accomplish different speech tasks. Speech features derived
patterns (unknown data points). Expectation maximization
(EM) algorithm is used for finding maximum likelihood              from excitation source signal are known as source features.
estimates of parameters in probabilistic models. Moreover          Excitation source signal is obtained from speech, after
Linear Predictive (LP) analysis method has been chosen for         suppressing vocal tract (VT) characteristics. This is achieved
extracting the emotional features because it is one of the most    by-
powerful speech analysis techniques for estimating the basic       1. First predicting the VT information using filter coefficients
speech analysis techniques for estimating the basic speech         (linear prediction coefficients (LPCs)) from speech signal.
parameter such as pitch, formants, spectra, vocal tract
                                                                   2. Then separating it by inverse filter formulation the
functions and for representing speech by low bit rate
                                                                   resulting signal is known as linear prediction residual. It
transmission for storage. Speakers are made to involve in
                                                                   contains mostly the information about the excitation source.
emotional conversation with the anchor, where different
contextual situations are created by the anchor through the
conversation to elicit different emotions from the subject,
without his/her knowledge.

  Index Terms— Speech, Gaussian Mixture Model, Vocal,
Emotion Recognition, Linear predictive.


                       I. INTRODUCTION                                        Fig1. Emotion recognition system block
    Emotional speech reorganization is basically identifying the
                                                                   The features derived from LP residual are referred to as
emotional or physical state of human being from his or her
                                                                   Excitation source, sub-segmental, simply source features. LP
voice[10]. Speech is a complex signal containing information
                                                                   residual   signal   basically contains     the   higher   order
                                                                   correlations among its samples as the first and second order

                                                                                                                                65
                                               All Rights Reserved © 2012 IJARCET
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012

correlations are filtered out during LP analysis. These higher                       II. PROPOSED WORK
order correlations may be captured to some extent, by using        In this work, GMMs are used to develop emotion recognition
the features like strength of excitation, characteristics of       systems using excitation features. GMMs are known to
glottal volume velocity waveform, shapes of the glottal pulse,     capture distribution of data points from the input feature
characteristics of open and closed phases of glottis and so on.    space. Therefore, GMMs are suitable for developing emotion
The excitation source information contains all flavors of          recognition models using spectral features, as the decision
speech such as message, speaker, language, and emotion             regarding the emotion category of the feature vector is taken
specific information. Pitch information extracted from LP          based on its probability of coming from the feature vectors of
residual signal is successfully used in for speaker                the specific model. Gaussian Mixture Models (GMMs) are
recognition[5]. LP residual energy is used for vowel and           among the most statistically matured methods for clustering
speaker recognition. Cepstral features derived from LP             and for density estimation. They model the probability
residual signal are used for capturing the speaker specific        density function of observed data points using a multivariate
information[4]. The combination of features derived from LP        Gaussian mixture density. Given a set of inputs, GMM
residual and LP residual cepstrum has been used to minimize        refines    the   weights   of   each   distribution   through
the equal error rate in case of speaker recognition. By            expectation-maximization algorithm. Once a model is
processing LP residual signal using Hilbert envelope and           generated, conditional probabilities can be computed for test
group delay function, the instants of significant excitation are   patterns (unknown data points). Here we have considered
accurately      determined.     Applications-Speech    emotion     only on four emotions namely Happy, Anger, Sad and
recognition has several applications in day-to-day life. Some      Neutral.
of these are:


   I.    It is Useful for enhancing the naturalness in speech
         based human machine interaction.
  II.    Call center conversation may be used to analyze
         behavioral study of call attendants with the
         customers which helps to improve quality of service
         of a call attendant.
 III.    Interactive movie, storytelling and E-tutoring
         applications would be more practical, if they can
         adapt themselves to listeners’ or students’ emotional
         states.
 IV.     Emotion      analysis    of   telephone   conversation                         Fig2. GMM model
         between criminals would help crime investigation
                                                                                    III. FEATURE EXTRACTION.
         department.
                                                                   Recognition of Emotions from Speech-Speech features may
  V.     Conversation with robotic pets and humanoid
                                                                   be basically extracted from excitation source, vocal tract or
         partners would be more realistic and enjoyable, if
                                                                   prosodic points of view to accomplish different speech tasks.
         they are able to understand and express emotions
                                                                   This work confines its scope to spectral features used for
         like humans.
                                                                   recognizing emotions. Normally excitation features are
 VI.     In aircraft cockpits, speech recognition systems
                                                                   extracted through block processing approach. Therefore
         trained to recognize stressed speech are used for
                                                                   entire speech signal is processed block by block considering
         better performance.
                                                                   the block size of around 20 ms. Blocks are also known as
                                                                   frames. It is assumed that within a block, speech signal is

                                              All Rights Reserved © 2012 IJARCET
                                                                                                                              66
ISSN: 2278 – 1323
                                     International Journal of Advanced Research in Computer Engineering & Technology
                                                                                          Volume 1, Issue 5, July 2012

stationary in nature. Block processing approach suffers from              LP residual obtained by inverse filtering is as
some logical problems, they are: physical blocking of speech              follows:
signal may not be suitable for extracting features, as it is
difficult to find relationships among the neighboring feature
vectors. Block processing approach blindly processes entire               Where S(n) is current speech sample, p is Oder of
speech signal, but the redundant information present in the               prediction a’k are the filter coefficient and S(n-k) is
regions like steady vowel portion may be exempted from                    the (n-k)th sample of speech. Excitation source
feature extraction. Generally most of the languages in Indian             signal     may   contain       the   emotion   specific
context are syllabic in nature. Performance of emotion                    information, in the form of unique features such as
recognition systems is found out using semi natural database              higher order relations among linear prediction (LP)
collected from Hindi movies and simulated database.                       residual samples. LP residual signal is obtained by
                                                                          first extracting the vocal tract information from the
                                                                          speech signal and then suppressing it by inverse
                                                                          filter formulation. Resulting signal is termed as LP
                                                                          residual and contains mostly information about the
                                                                          excitation source LP residual is then derived by
                                                                          inverse filtering of the speech signal. All the
                                                                          calculation for LP analysis is developed using
                                                                          MATLAB7 function where input is the segment
                                                                          speech signal and the output are the LPC
Fig3. Emotion feature Extraction process                                  coefficients which later are used to determine the
                                                                          final parameter for results.

Feature extraction basically involves following stages:                               IV. DATABASES
    1.   Preprocessing- In this the sample speech which is       It’s very important that the collection of various emotion
         in digitized form is first normalized by its maximum    voices should have a clear representation of acoustics
         amplitude and the D.C. component is removed.            correlates of one emotion. For this the utterance should be
         After this sample is to be divided into 20 msec         expressed skillfully for the intended emotion. Good
         frames. In order to use the emotion specific            recordings of spontaneously produced emotional utterances
         information from speech, one needs to extract the       are difficult to collect. Speech materials that are
         features from different levels (sub-segmental,          spontaneously produced have number of drawbacks. The
         segmental      and      supra-segmental).        Here   recordings are usually not free of background noises. One of
         sub-segmental (excitation source) features are used     the most important requirements for doing work in speech
         for analyzing the emotions present in the speech.       recognition is a database with the appropriate materials for
    2.   Linear predictive analysis- LP analysis method is       training and testing the system under development. The size
         one of the most powerful speech analysis techniques     of database is crucial to the achieve and intended results, so
         for estimating the basic speech parameter. In           collecting and processing data to build a useful database is
         general, for analysis and processing of speech          not trivial. The evaluation of speech emotion recognition
         signal, vocal tract system is modeled as a time         system is based on the level of naturalness of database which
         varying filter, and presence or absence of excitation   is used as an input to speech emotion recognition system. If
         causes voiced or unvoiced speech. The equation for      the inferior database is used as an input to the system then
                                                                 incorrect conclusion may be drawn. The database as an input


                                                                                                                              67
                                            All Rights Reserved © 2012 IJARCET
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012

to the speech emotion recognition system may contain the            in row indicate miss-classification percentages. From table
real world emotions or the acted ones. It is more practical to      below it is analyzed that 54% of anger is classified as anger
use database that is collected from the real life situations. The   similarly 67%, 52% and 61% of happy, neutral and sad
database proposed in this paper is collected from Hindi             emotions are recognized correctly for single male speaker. So
movies by analyzing the emotions from the dialogues being           the total result of this matrix is analyzed is 59%.
delivered by the film actors/actresses. The database is
considered as a semi natural one as simulation the appropriate
emotions is close to the real and practical situations. In
movies, the emotions expressed are more realistic. Hence, it
                                                                    Table2:Emotion classification performance (in
become easier for an observer to categorize them based on                      percentage)
context and by listening the dialogues being spoken by the          Later, the same process is used for female and male+female
speaker. Male and female dialogues are separately extracted         (male and female utterances for training and testing are used
from the movies of popular actors/actresses to collect desired      simultaneously) speakers. Table III shows the comparison of
emotions. While collecting the database, initially the audio is     emotion recognition performance for different 4 emotions for
extracted from the video with the help of Adobe Audition, in        closed and open set. Where closed set means same utterances
which the sampling rate of 16 KHz and mono channel with             are used for training as well as testing whereas open set
16 bit resolution are chosen. Speech utterances without             means different set of utterances are used for training and
background music are extracted carefully to be part of the          testing. The emotion recognition performance in case of
database based on the contextual emotion present.                   closed set utterances has been observed to be 92.5%, 84.75%
                                                                    and 94% for male, female and male+female speakers
                                                                    respectively.




            Table 1: Details of multi speaker.


                                                                    Table3. Comparison of emotion recognition
                                                                    The emotion recognition performance in case of open set
                                                                    utterances has been observed to be 59.25%, 61.25% and
                                                                    58.5% for male, female and male+female speakers
                                                                    respectively. It is observed that in case of closed set
                                                                    utterances, better results are obtained as compared to open set
                                                                    utterances. This is so because in case of closed set utterances,
                                                                    same utterances are used for training as well as testing the
                                                                    emotion recognition models. While in case of open set
  Fig4. LP residual for ANGER, HAPPY, SAD, NEUTRAL.                 utterances, different utterances have been used for training
                                                                    and testing the emotion recognition models. Another possible
                        V. RESULTS
                                                                    reason for low recognition rate for open set utterances could
After training and testing the emotion recognition models
                                                                    be: use of less speech utterances for training and testing the
results are obtained in form of 4X4 matrix form. Each row of
                                                                    emotion recognition models.
matrix represents the test data recognized by different models
in form of percentage and each column of the matrix
represents the trained model. In this matrix, diagonal
elements show the correct classification and other elements

                                               All Rights Reserved © 2012 IJARCET
                                                                                                                                 68
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012

                   VI. CONCLUSION                                [10] L. R. Rabiner and B. H. Juang, Fundamentals of Speech
In this paper work Excitation Source features are used for       Recognition.Englewood Cliffs, New Jersy: Prentice- Hall,
characterizing the emotions present in a speech. As usual, the   1993.
emotion recognition performance using LP residual is not         [11] T. V. Ananathapadmanabha and B. Yegnanarayana,
sufficient   enough   to   develop     sophisticated   emotion   "Epoch exctraction from linear prediction residual for
recognition system. The performance may be improved by           identification of closed glottis interval," IEEE Trans.
combining the different features such as spectral and            Acoustics, Speech and Signal Processing, vol. 27, pp. 309-
prosodic. Overall emotion recognition performance is             319, 1979 1997.
observed to be about 52-60%.                                     [12] F. Charles, D. Pizzi, M. Cavazza, T. Vogt, and E. Andr,
                                                                 Emoemma:      Emotional   speech    input   for   interactive

    .                                                            storytelling, in 8th Int. Conf. on Autonomous Agents and
                                                                 MultiagentSystems (AAMAS 2009) (Decker, Sichman,
                           REFERENCES
                                                                 Sierra, and Castelfranchi, eds.), (Budapest, Hungary), pp.
[1] S. Prasanna, C. Gupta, and B. Yegnanarayana,                 13811382, May 2009.
”Extraction of speakerspecific information from linear
prediction residual of speech,” J. Acoust., Soc. , Amer.
Speech Communication, vol. 48, pp. 1243-1261, Oct. 2006.
[2] B. AtaI, ”Automatic speaker recognition based on pitch
contours,” J. Acoust. Soc. Amer., vol. 52, pp. 1687- 1697,
March 1972.
[3] H. Wakita, ”Residual energy of linear prediction to vowel
and speaker recognition,” IEEE Trans. Acoust. Speech
Signal Process. vol. 24, pp. 270-271, April 1976.
[4] B. Yegnanarayana, S. R. M. Prasanna, and K. S. Rao,
”Speech      enhancement       using      excitation    source
information,” Proc. IEEE Int. Con! Acoust. , Speech, Signal
Processing, vol. 1, pp. 541-544, May 2002.
[5] A. Bajpai and B. Yegnanarayana, ”Combining evidence
from subsegmental and segmental features for audio clip
classification,” TENCONIEEE region 10 corifTences, pp.
1-5, Nov 2008.
[6] J. Benesty, M. M. Sondhi, and Y. Huang,”Springer
handbook on speech processing,” Springer Publisher, 2008.
[7] C. M. Lee and S. S. Narayanan, toward detecting
emotions in spoken dialogs, IEEE Trans. Speech and Audio
Processing,vol. 13, pp. 293303, Mar. 2005.
[8] J. Nicholson, K. Takahashi, and R. Nakatsu,”Emotion
recognition in speech using neural networks,” Neural
Computing and Applications, vol. 9, pp. 290-296, Dec. 2000.
[9] J. Nicholson, K. Takahashi, and R. Nakatsu, "Emotion
recognition in speech using neural networks," Neural
Computing and Applications,vol. 9, pp.290-296, Dec. 2000.

                                                                                                                           69
                                              All Rights Reserved © 2012 IJARCET

Mais conteúdo relacionado

Mais procurados

ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM ModelASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
sipij
 
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Lviv Data Science Summer School
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
Diptimaya Sarangi
 
VAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speechVAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speech
KunZhou18
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From Speech
IJERA Editor
 

Mais procurados (20)

M3er multiplicative_multimodal_emotion_recognition
M3er multiplicative_multimodal_emotion_recognitionM3er multiplicative_multimodal_emotion_recognition
M3er multiplicative_multimodal_emotion_recognition
 
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
 
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM ModelASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognition
 
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
 
A critical insight into multi-languages speech emotion databases
A critical insight into multi-languages speech emotion databasesA critical insight into multi-languages speech emotion databases
A critical insight into multi-languages speech emotion databases
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
IEEE ICASSP 2021
IEEE ICASSP 2021IEEE ICASSP 2021
IEEE ICASSP 2021
 
Oral Qualification Examination_Kun_Zhou
Oral Qualification Examination_Kun_ZhouOral Qualification Examination_Kun_Zhou
Oral Qualification Examination_Kun_Zhou
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
VAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speechVAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speech
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
F334047
F334047F334047
F334047
 
Signal Processing Tool for Emotion Recognition
Signal Processing Tool for Emotion RecognitionSignal Processing Tool for Emotion Recognition
Signal Processing Tool for Emotion Recognition
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From Speech
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 

Destaque

Destaque (10)

Emotional analysis and evaluation of kannada speech database
Emotional analysis and evaluation of kannada speech databaseEmotional analysis and evaluation of kannada speech database
Emotional analysis and evaluation of kannada speech database
 
Upgrading the Performance of Speech Emotion Recognition at the Segmental Level
Upgrading the Performance of Speech Emotion Recognition at the Segmental Level Upgrading the Performance of Speech Emotion Recognition at the Segmental Level
Upgrading the Performance of Speech Emotion Recognition at the Segmental Level
 
A017410108
A017410108A017410108
A017410108
 
Detection of speech under stress using spectral analysis
Detection of speech under stress using spectral analysisDetection of speech under stress using spectral analysis
Detection of speech under stress using spectral analysis
 
IEEE 2016-2017 SOFTWARE TITLE
IEEE 2016-2017  SOFTWARE TITLE IEEE 2016-2017  SOFTWARE TITLE
IEEE 2016-2017 SOFTWARE TITLE
 
Emotion recognition
Emotion recognitionEmotion recognition
Emotion recognition
 
L011138596
L011138596L011138596
L011138596
 
A011140104
A011140104A011140104
A011140104
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 

Semelhante a 65 69

BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHBASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
IJCSEA Journal
 
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
ijtsrd
 

Semelhante a 65 69 (20)

Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Signal & Image Processing : An International Journal
Signal & Image Processing : An International Journal Signal & Image Processing : An International Journal
Signal & Image Processing : An International Journal
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep Learning
 
76201926
7620192676201926
76201926
 
Novel Methodologies for Classifying Gender and Emotions Using Machine Learnin...
Novel Methodologies for Classifying Gender and Emotions Using Machine Learnin...Novel Methodologies for Classifying Gender and Emotions Using Machine Learnin...
Novel Methodologies for Classifying Gender and Emotions Using Machine Learnin...
 
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
 
52 57
52 5752 57
52 57
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
200502
200502200502
200502
 
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHBASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
 
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
SPEECH EMOTION RECOGNITION SYSTEM USING RNNSPEECH EMOTION RECOGNITION SYSTEM USING RNN
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
 
Efficient Speech Emotion Recognition using SVM and Decision Trees
Efficient Speech Emotion Recognition using SVM and Decision TreesEfficient Speech Emotion Recognition using SVM and Decision Trees
Efficient Speech Emotion Recognition using SVM and Decision Trees
 
Jw2417001703
Jw2417001703Jw2417001703
Jw2417001703
 
Signal & Image Processing : An International Journal
Signal & Image Processing : An International JournalSignal & Image Processing : An International Journal
Signal & Image Processing : An International Journal
 
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN ModelASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
 
histogram-based-emotion
histogram-based-emotionhistogram-based-emotion
histogram-based-emotion
 
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
 
Speech Emotion Recognition Using Neural Networks
Speech Emotion Recognition Using Neural NetworksSpeech Emotion Recognition Using Neural Networks
Speech Emotion Recognition Using Neural Networks
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 

Mais de Editor IJARCET

Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207
Editor IJARCET
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199
Editor IJARCET
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
Editor IJARCET
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
Editor IJARCET
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
Editor IJARCET
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185
Editor IJARCET
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176
Editor IJARCET
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
Editor IJARCET
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164
Editor IJARCET
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158
Editor IJARCET
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154
Editor IJARCET
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
Editor IJARCET
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
Editor IJARCET
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142
Editor IJARCET
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138
Editor IJARCET
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129
Editor IJARCET
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118
Editor IJARCET
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
Editor IJARCET
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107
Editor IJARCET
 

Mais de Editor IJARCET (20)

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturization
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

65 69

  • 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 Speech based Emotion Recognition with Gaussian Mixture Model Nitin Thapliyal1 Gargi Amoli2 Assistant Professor UIT, Dehradun Assistant Professor DIT, Dehradun thapliyal.nitin@gmail.com gargi.amoli@gmail.com  about the message, speaker, language and emotions .Speech is produced from a time varying vocal tract system excited by Abstract— This paper is mainly concerned with speech based emotion recognition. The main work is concerned with a time varying excitation source. Emotion on other side is an Gaussian mixture model (GMM model) which allows training individual mental state that arises spontaneously rather than the desired data set from the databases. GMM are known to through conscious effort. There are various kinds of emotions capture distribution of data point from the input feature space, which are present in a speech. Some are ANGER, therefore GMM are suitable for developing emotion recognition COMPASSION, DISGUST, FEAR, HAPPY, NEUTRAL, model when large number of feature vector is available. Given a SARCASTIC and SURPRISE. Recognition of Emotions set of inputs, GMM refines the weights of each distribution from Speech-Speech features may be basically extracted through expectation-maximization algorithm. Once a model is from excitation source, vocal tract or prosodic points of view generated, conditional probabilities can be computed for test to accomplish different speech tasks. Speech features derived patterns (unknown data points). Expectation maximization (EM) algorithm is used for finding maximum likelihood from excitation source signal are known as source features. estimates of parameters in probabilistic models. Moreover Excitation source signal is obtained from speech, after Linear Predictive (LP) analysis method has been chosen for suppressing vocal tract (VT) characteristics. This is achieved extracting the emotional features because it is one of the most by- powerful speech analysis techniques for estimating the basic 1. First predicting the VT information using filter coefficients speech analysis techniques for estimating the basic speech (linear prediction coefficients (LPCs)) from speech signal. parameter such as pitch, formants, spectra, vocal tract 2. Then separating it by inverse filter formulation the functions and for representing speech by low bit rate resulting signal is known as linear prediction residual. It transmission for storage. Speakers are made to involve in contains mostly the information about the excitation source. emotional conversation with the anchor, where different contextual situations are created by the anchor through the conversation to elicit different emotions from the subject, without his/her knowledge. Index Terms— Speech, Gaussian Mixture Model, Vocal, Emotion Recognition, Linear predictive. I. INTRODUCTION Fig1. Emotion recognition system block Emotional speech reorganization is basically identifying the The features derived from LP residual are referred to as emotional or physical state of human being from his or her Excitation source, sub-segmental, simply source features. LP voice[10]. Speech is a complex signal containing information residual signal basically contains the higher order correlations among its samples as the first and second order 65 All Rights Reserved © 2012 IJARCET
  • 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 correlations are filtered out during LP analysis. These higher II. PROPOSED WORK order correlations may be captured to some extent, by using In this work, GMMs are used to develop emotion recognition the features like strength of excitation, characteristics of systems using excitation features. GMMs are known to glottal volume velocity waveform, shapes of the glottal pulse, capture distribution of data points from the input feature characteristics of open and closed phases of glottis and so on. space. Therefore, GMMs are suitable for developing emotion The excitation source information contains all flavors of recognition models using spectral features, as the decision speech such as message, speaker, language, and emotion regarding the emotion category of the feature vector is taken specific information. Pitch information extracted from LP based on its probability of coming from the feature vectors of residual signal is successfully used in for speaker the specific model. Gaussian Mixture Models (GMMs) are recognition[5]. LP residual energy is used for vowel and among the most statistically matured methods for clustering speaker recognition. Cepstral features derived from LP and for density estimation. They model the probability residual signal are used for capturing the speaker specific density function of observed data points using a multivariate information[4]. The combination of features derived from LP Gaussian mixture density. Given a set of inputs, GMM residual and LP residual cepstrum has been used to minimize refines the weights of each distribution through the equal error rate in case of speaker recognition. By expectation-maximization algorithm. Once a model is processing LP residual signal using Hilbert envelope and generated, conditional probabilities can be computed for test group delay function, the instants of significant excitation are patterns (unknown data points). Here we have considered accurately determined. Applications-Speech emotion only on four emotions namely Happy, Anger, Sad and recognition has several applications in day-to-day life. Some Neutral. of these are: I. It is Useful for enhancing the naturalness in speech based human machine interaction. II. Call center conversation may be used to analyze behavioral study of call attendants with the customers which helps to improve quality of service of a call attendant. III. Interactive movie, storytelling and E-tutoring applications would be more practical, if they can adapt themselves to listeners’ or students’ emotional states. IV. Emotion analysis of telephone conversation Fig2. GMM model between criminals would help crime investigation III. FEATURE EXTRACTION. department. Recognition of Emotions from Speech-Speech features may V. Conversation with robotic pets and humanoid be basically extracted from excitation source, vocal tract or partners would be more realistic and enjoyable, if prosodic points of view to accomplish different speech tasks. they are able to understand and express emotions This work confines its scope to spectral features used for like humans. recognizing emotions. Normally excitation features are VI. In aircraft cockpits, speech recognition systems extracted through block processing approach. Therefore trained to recognize stressed speech are used for entire speech signal is processed block by block considering better performance. the block size of around 20 ms. Blocks are also known as frames. It is assumed that within a block, speech signal is All Rights Reserved © 2012 IJARCET 66
  • 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 stationary in nature. Block processing approach suffers from LP residual obtained by inverse filtering is as some logical problems, they are: physical blocking of speech follows: signal may not be suitable for extracting features, as it is difficult to find relationships among the neighboring feature vectors. Block processing approach blindly processes entire Where S(n) is current speech sample, p is Oder of speech signal, but the redundant information present in the prediction a’k are the filter coefficient and S(n-k) is regions like steady vowel portion may be exempted from the (n-k)th sample of speech. Excitation source feature extraction. Generally most of the languages in Indian signal may contain the emotion specific context are syllabic in nature. Performance of emotion information, in the form of unique features such as recognition systems is found out using semi natural database higher order relations among linear prediction (LP) collected from Hindi movies and simulated database. residual samples. LP residual signal is obtained by first extracting the vocal tract information from the speech signal and then suppressing it by inverse filter formulation. Resulting signal is termed as LP residual and contains mostly information about the excitation source LP residual is then derived by inverse filtering of the speech signal. All the calculation for LP analysis is developed using MATLAB7 function where input is the segment speech signal and the output are the LPC Fig3. Emotion feature Extraction process coefficients which later are used to determine the final parameter for results. Feature extraction basically involves following stages: IV. DATABASES 1. Preprocessing- In this the sample speech which is It’s very important that the collection of various emotion in digitized form is first normalized by its maximum voices should have a clear representation of acoustics amplitude and the D.C. component is removed. correlates of one emotion. For this the utterance should be After this sample is to be divided into 20 msec expressed skillfully for the intended emotion. Good frames. In order to use the emotion specific recordings of spontaneously produced emotional utterances information from speech, one needs to extract the are difficult to collect. Speech materials that are features from different levels (sub-segmental, spontaneously produced have number of drawbacks. The segmental and supra-segmental). Here recordings are usually not free of background noises. One of sub-segmental (excitation source) features are used the most important requirements for doing work in speech for analyzing the emotions present in the speech. recognition is a database with the appropriate materials for 2. Linear predictive analysis- LP analysis method is training and testing the system under development. The size one of the most powerful speech analysis techniques of database is crucial to the achieve and intended results, so for estimating the basic speech parameter. In collecting and processing data to build a useful database is general, for analysis and processing of speech not trivial. The evaluation of speech emotion recognition signal, vocal tract system is modeled as a time system is based on the level of naturalness of database which varying filter, and presence or absence of excitation is used as an input to speech emotion recognition system. If causes voiced or unvoiced speech. The equation for the inferior database is used as an input to the system then incorrect conclusion may be drawn. The database as an input 67 All Rights Reserved © 2012 IJARCET
  • 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 to the speech emotion recognition system may contain the in row indicate miss-classification percentages. From table real world emotions or the acted ones. It is more practical to below it is analyzed that 54% of anger is classified as anger use database that is collected from the real life situations. The similarly 67%, 52% and 61% of happy, neutral and sad database proposed in this paper is collected from Hindi emotions are recognized correctly for single male speaker. So movies by analyzing the emotions from the dialogues being the total result of this matrix is analyzed is 59%. delivered by the film actors/actresses. The database is considered as a semi natural one as simulation the appropriate emotions is close to the real and practical situations. In movies, the emotions expressed are more realistic. Hence, it Table2:Emotion classification performance (in become easier for an observer to categorize them based on percentage) context and by listening the dialogues being spoken by the Later, the same process is used for female and male+female speaker. Male and female dialogues are separately extracted (male and female utterances for training and testing are used from the movies of popular actors/actresses to collect desired simultaneously) speakers. Table III shows the comparison of emotions. While collecting the database, initially the audio is emotion recognition performance for different 4 emotions for extracted from the video with the help of Adobe Audition, in closed and open set. Where closed set means same utterances which the sampling rate of 16 KHz and mono channel with are used for training as well as testing whereas open set 16 bit resolution are chosen. Speech utterances without means different set of utterances are used for training and background music are extracted carefully to be part of the testing. The emotion recognition performance in case of database based on the contextual emotion present. closed set utterances has been observed to be 92.5%, 84.75% and 94% for male, female and male+female speakers respectively. Table 1: Details of multi speaker. Table3. Comparison of emotion recognition The emotion recognition performance in case of open set utterances has been observed to be 59.25%, 61.25% and 58.5% for male, female and male+female speakers respectively. It is observed that in case of closed set utterances, better results are obtained as compared to open set utterances. This is so because in case of closed set utterances, same utterances are used for training as well as testing the emotion recognition models. While in case of open set Fig4. LP residual for ANGER, HAPPY, SAD, NEUTRAL. utterances, different utterances have been used for training and testing the emotion recognition models. Another possible V. RESULTS reason for low recognition rate for open set utterances could After training and testing the emotion recognition models be: use of less speech utterances for training and testing the results are obtained in form of 4X4 matrix form. Each row of emotion recognition models. matrix represents the test data recognized by different models in form of percentage and each column of the matrix represents the trained model. In this matrix, diagonal elements show the correct classification and other elements All Rights Reserved © 2012 IJARCET 68
  • 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 VI. CONCLUSION [10] L. R. Rabiner and B. H. Juang, Fundamentals of Speech In this paper work Excitation Source features are used for Recognition.Englewood Cliffs, New Jersy: Prentice- Hall, characterizing the emotions present in a speech. As usual, the 1993. emotion recognition performance using LP residual is not [11] T. V. Ananathapadmanabha and B. Yegnanarayana, sufficient enough to develop sophisticated emotion "Epoch exctraction from linear prediction residual for recognition system. The performance may be improved by identification of closed glottis interval," IEEE Trans. combining the different features such as spectral and Acoustics, Speech and Signal Processing, vol. 27, pp. 309- prosodic. Overall emotion recognition performance is 319, 1979 1997. observed to be about 52-60%. [12] F. Charles, D. Pizzi, M. Cavazza, T. Vogt, and E. Andr, Emoemma: Emotional speech input for interactive . storytelling, in 8th Int. Conf. on Autonomous Agents and MultiagentSystems (AAMAS 2009) (Decker, Sichman, REFERENCES Sierra, and Castelfranchi, eds.), (Budapest, Hungary), pp. [1] S. Prasanna, C. Gupta, and B. Yegnanarayana, 13811382, May 2009. ”Extraction of speakerspecific information from linear prediction residual of speech,” J. Acoust., Soc. , Amer. Speech Communication, vol. 48, pp. 1243-1261, Oct. 2006. [2] B. AtaI, ”Automatic speaker recognition based on pitch contours,” J. Acoust. Soc. Amer., vol. 52, pp. 1687- 1697, March 1972. [3] H. Wakita, ”Residual energy of linear prediction to vowel and speaker recognition,” IEEE Trans. Acoust. Speech Signal Process. vol. 24, pp. 270-271, April 1976. [4] B. Yegnanarayana, S. R. M. Prasanna, and K. S. Rao, ”Speech enhancement using excitation source information,” Proc. IEEE Int. Con! Acoust. , Speech, Signal Processing, vol. 1, pp. 541-544, May 2002. [5] A. Bajpai and B. Yegnanarayana, ”Combining evidence from subsegmental and segmental features for audio clip classification,” TENCONIEEE region 10 corifTences, pp. 1-5, Nov 2008. [6] J. Benesty, M. M. Sondhi, and Y. Huang,”Springer handbook on speech processing,” Springer Publisher, 2008. [7] C. M. Lee and S. S. Narayanan, toward detecting emotions in spoken dialogs, IEEE Trans. Speech and Audio Processing,vol. 13, pp. 293303, Mar. 2005. [8] J. Nicholson, K. Takahashi, and R. Nakatsu,”Emotion recognition in speech using neural networks,” Neural Computing and Applications, vol. 9, pp. 290-296, Dec. 2000. [9] J. Nicholson, K. Takahashi, and R. Nakatsu, "Emotion recognition in speech using neural networks," Neural Computing and Applications,vol. 9, pp.290-296, Dec. 2000. 69 All Rights Reserved © 2012 IJARCET