SlideShare a Scribd company logo
1 of 4
Download to read offline
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 146
ISOLATED WORDS RECOGNITION USING MFCC, LPC AND
NEURAL NETWORK
Mayur R Gamit1
, Kinnal Dhameliya2
1
M.Tech student, E&C Department, C.G.P.I.T, Bardoli, India
2
Assistant Professor, E&C Department, C.G.P.I.T, Bardoli, India
Abstract
Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural
Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero
crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and
combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The
recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach
using Neural Network as a classifier..
Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial
Neural Network (ANN).
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
Automatic speech recognition (ASR) has been the most
investigated topic in speech processing since early 1960s [1].
Speech recognition is a popular and active area of research,
used to translate words spoken by humans so as to make
them computer recognizable. It usually involves extraction of
features from speech signal and representing them using an
appropriate data model. ASR system involves two phases.
Training phase and Testing phase. In training phase, known
speech is recorded and parametric representation of the
speech is extracted and stored in the speech database. In the
testing phase, for the given input speech signal the features
are extracted and ASR system compares it with the reference
templates to recognize the utterance.
This paper proposes an approach to recognize isolated words
using Mel frequency Cepstral coefficient (MFCC) and Linear
Predictive Coding (LPC) as a feature extraction techniques
and Neural Network as a classification technique. This paper
evaluates the recognition accuracy by using only MFCC and
combination features of MFCC and LPC. The MFCC gives
higher recognition accuracy in speech recognition systems as
compared to other techniques. The main advantage of using
MFCC techniques is because of less complexity and accurate
results while LPC is suitable for speaker recognition.
The Back-propagation neural network is used as a classifier.
The human voice is recorded and digitized. Then it is input to
the pre-processing block. The main task of this block is to
separate out the unvoiced speech samples from the voiced
speech samples. The detection of voiced and unvoiced
speech can be done on the basis of calculating energy and
zero crossing rates frame by frame basis. Framing is
necessary to analyze the speech. Speech is quasi-periodic
signal, so analysis can be done by fragmenting the speech
into small segments called chunks or frames.
The frame work of speech recognition is shown below:
Fig. 1 Block diagram of speech recognition system.
This paper proposes a novel and efficient way to recognize
isolated words by using advanced pre-processing strategy.
Section 2, introduces a new pre-processing concept. Sections
3, describes the feature extraction techniques MFCC and
combine features of both MFCC and LPC. Section 4,
describes the neural network and section 5, describes the
experimental results.
2. PRE-PROCESSING CONCEPT
The recorded speech is feed to the preprocessing block. The
speech is segmented into small chunks or frames. The
determination of voiced and unvoiced speech samples can be
done on the basis of Energy and Zero crossing rates (ZCR)
[2].
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 147
2.1 Energy
The speech signal is fragmented into small frames. Each
frame is of ฯ‰ samples, where ฯ‰ ห‚ n as n is total number of
samples. The energy of speech is calculated frame by frames.
The square of each sample is done and finally summation of
all squared samples is done.
The equation to calculate energy is given below:
Energy = ๐‘ฅ๐‘–
2๐œ”
๐‘–=1 (1)
2.2 Zero Crossing Rate (ZCR)
The zero crossing rates means the number of times the
transition of speech signal from positive to negative or vice-
versa. The zero crossing rates are calculated frame by
frames. If the zcr of speech samples having more zero
crossing rates then it is a unvoiced otherwise voiced.
According to the survey done by rabiner the ZCR of fricative
speech samples are more than threshold then consider it as a
voiced speech. Fricative has more zero crossing rates as
compared to unvoiced speech. The equation to calculate the
ZCR is as follows:
ZCR=
๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ ๐‘– โˆ’๐‘ ๐‘–๐‘”๐‘› (๐‘ฅ ๐‘–โˆ’1)
2
๐œ”
๐‘–=1 (2)
2.3 Start Point and End Point Detection
The calculation of the energy and zero crossing rates can be
done on the basis of frames. The start point and end point can
be accurately found out on the basis of both energy and zero
crossing rates. First the start and end points can be
determined based on energy only. Based on zcr of the start
and end points are computed. If the zcr of the frame having
start point is more than the start point is shifted towards left
side and same as end point shifted towards right.
๏‚ท If ZCR > (threshold) then start point is shifted
towards left otherwise remains same.
๏‚ท If ZCR > (threshold) then end point is shifted towards
right otherwise remains same.
2.4 Removal of Unvoiced Parts between Start Point
and End Point
The recognition accuracy can be increased when the
unvoiced part between the start and end points can be
removed. There are certain samples of speech in between
start and end points which have minimum energy. They do
not contain any information, so by removing them makes
recognition accuracy better. This advance concept has been
used in this speech recognition system.
3. FEATURE EXTRACTION
In feature extraction the Mel frequency Cepstral coefficient
(MFCC) and combine features of both MFCC and LPC are
used. The both techniques are described below:
3.1 Mel-Frequency Cepstrum Coefficients (MFCC)
MFCC stands for Mel Frequency Cepstral Coefficient.
MFCC is one of the most commonly used feature extraction
method in speech recognition [3], [10].
Step1: Pre-emphasis
The signal is passed through a filter which emphasis a high
frequencies [6]. This process increases the energy of signal at
high frequency. High frequency also contains information.
The equation used to denotes the pre-emphasis is shown
below:
๐‘† ๐‘› = ๐‘‹ ๐‘› โˆ’ ๐‘Ž โˆ— ๐‘‹ ๐‘› โˆ’ 1 (3)
Where s(n) denotes the output sample, x(n) is present
sample, x(n-1) is past sample and value of a is between 0.95
to 1.
Step 2: Framing and Overlapping
The speech signal is split into several frames such that each
frame can be examined in the short time instead of the entire
signal. The frame size is of the range 20-40 ms. Then
overlapping is applied to frames, hamming window is
applied. The equation of hamming window is as follows:
๐‘† ๐‘› = ๐‘‹ ๐‘› โˆ— ๐‘Š ๐‘› 4
๐‘Š ๐‘› = 0.54 โˆ’ 0.46 cos
2๐œ‹๐‘›
๐‘ โˆ’ 1
0 โ‰ค ๐‘› โ‰ค ๐‘ โˆ’ 1 (5)
The overall process is shown in figure below:
Fig. 2 Block diagram of MFCC.
Step 3: Fast Fourier Transform
The Fast Fourier Transform (FFT) converts the frames from
time domain to frequency domain. The conversion is done
from time to frequency domain because the information is
more in frequency domain. Therefore, FFT is executed to
obtain the magnitude frequency response of each frame and
to prepare the signal for the next stage.
๐‘† ๐œ” = ๐‘“๐‘“๐‘ก ๐‘‹ ๐‘› (6)
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 148
Step 4: Mel Filter bank
Human ear perception of frequency contents of sounds for
speech signal does not follow a linear scale. Therefore, for
each tone with an actual frequency f, measured in Hz, a
subjective pitch is measured on a scale called the โ€œmel
scaleโ€ [9]. The Mel frequency scale is linear frequency
spacing below 1000 Hz and a logarithmic spacing above
1000Hz [5]. To compute the Mel for a given frequency f in
Hz, a following formula is used.
๐น ๐‘š๐‘’๐‘™ = 2595 โˆ— ๐‘™๐‘œ๐‘”10 1 +
๐‘“
700
(7)
Each filterโ€™s magnitude frequency response is triangular in
shape and equal to unity at the centre frequency and decrease
linearly to zero at centre frequency of two adjacent filters.
Step 5: Log and IDCT
The output of Mel filter bank is given to log. This is the
process to convert the log Mel spectrum into time domain
using Inverse Discrete Cosine Transform (IDCT).
Step 6: Energy
The energy of all frames after IDCT is calculated. The result
of the conversion is called Mel Frequency Cepstrum
Coefficient. The set of coefficient is called acoustic vectors.
Therefore, each input utterance is transformed into a
sequence of acoustic vector.
3.2 Combine Features of MFCC and Linear
Predictive Coding (LPC)
The basic idea behind the Linear Predictive Coding (LPC)
analysis is that a speech sample can be approximated as
linear combination of past speech samples. The LPC
provides a robust, reliable and accurate method for
estimating the parameters that represent the vocal tract
system. The autocorrelation analysis is done. Levinson
Durbinโ€™s algorithm is used to analyze the LP model [11]. The
LPC gives the best results for the speaker recognition rather
than the speech recognition. Here, 10th
order LPC is taken
means 11 coefficients of each frames can be obtained. The
12 coefficients of MFCC and 11 coefficients of LPC are
combined frame by frame basis to yields 23 coefficients of
each frames.
4. ARTIFICIAL NEURAL NETWORK
Classifier used in this speech recognition is Back
Propagation Neural Network (BPNN) [8]. The back-
propagation architecture takes input nodes as features based
on the coefficients of MFCC and combine of both features
MFCC and LPC. The hidden layer consists of 299 hidden
nodes. The output layer consists of 280 nodes. The output
nodes from 1-28 belongs to โ€œzeroโ€, 29-56 belongs to โ€œoneโ€,
57-84 belongs to โ€œtwoโ€, 85-112 belongs to โ€œthreeโ€, 113-140
belongs to โ€œfourโ€, 141-168 belongs to โ€œfiveโ€, 169-196
belongs to โ€œsixโ€, 197-224 belongs to โ€œsevenโ€, 225-252
belongs to โ€œeightโ€, 253-280 belongs to โ€œnineโ€.
5. PERFORMANCE AND RESULTS
The performance of the speech recognition system is often
described in terms of accuracy. Table 1 indicates the
recognition accuracy by implementation of the proposed
algorithm on a dataset consists of 28 speakers of which 14
are males and 14 are females.
The obtained accuracy is compared with [4]. The both
features extraction techniques which are described above are
used and back propagation is used as a classifiers. The results
are compared with the neural network used in [4]. The
recording of speech has been done in a controlled
environment of laboratory using a noise removing
headphone. The measuring of recognition accuracy (RA) is
done based on the equation
๐‘…๐ด =
๐‘๐‘œ. ๐‘œ๐‘“ ๐‘ก๐‘–๐‘š๐‘’๐‘  ๐‘ค๐‘œ๐‘Ÿ๐‘‘ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘”๐‘›๐‘–๐‘ง๐‘’๐‘‘
๐‘‡๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘œ. ๐‘œ๐‘“ ๐‘ก๐‘Ÿ๐‘–๐‘Ž๐‘™๐‘ 
โˆ— 100 (8)
The table 1 shows the recognition accuracy obtained during
each session. The Mean square error (MSE) is 0.002482 at
10,000 epochs as compared to MSE of [4] is 0.005 at 1097
epochs for convergence.
Table 1: Recognition accuracy using MFCC and Neural
Network
Number of sessions Recognition Accuracy
1 88%
2 82%
3 77%
Average 82.33%
The table below shows the recognition accuracy by using
combine features of MFCC and LPC and back-propagation
neural network as classifiers. The 12 coefficients of MFCC
and 11 coefficients of LPC are combine frames by frames.
The accuracy at each session is shown below:
Table 2: Recognition accuracy using MFCC+LPC and
Neural Network
Number of sessions Recognition Accuracy
1 97%
2 78%
3 85%
Average 86.66%
The table below shows the comparative results of the
obtained recognition accuracy with the [4]. The proposed
approach gives the more recognition accuracy than described
in [4].
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 149
Table 3: Comparative results [4]
Classifi
er
Only
MFC
C
MFCC+L
PC
+STE+ZC
R
Only
MFCC
(Propos
ed)
MFCC+LPC
+
STE+ZCR
(Proposed)
Neural
Networ
k
51.25
%
85% 82.33% 86.66%
6. CONCLUSION
The experimental results shows that by using the proposed
MFCC and combination of both MFCC and LPC feature
extraction techniques the results are higher as compared to
[4]. The recognition accuracy is higher by using Back-
propagation instead of using MLP as shown in [4]. The
recognition accuracy may differ by using only MFCC, LPC
and combination of both MFCC and LPC techniques as well
as other classification techniques.
REFERENCES
[1] Xian Tang, โ€œHybrid Hidden Markov Model and
Artificial Neural Network for Automatic Speech
Recognition, โ€œPacific-Asia Conference on Circuits,
Communication and System, IEEE Computer
society, 2009.
[2] Nidhi Desai, Prof. Kinnal Dhameliya, โ€œFeature
Extraction and Classification Techniques for Speech
Recognition: A Review,โ€ International Journal of
Emerging Technology and Advanced Engineering,
Vol. 3, Issue 12, December 2013.
[3] Shanthi Therese S, Chelpa Lingam, โ€œReview of
Feature Extraction Techniques in Automatic Speech
Recognition,โ€ International Journal of Scientific
Engineering and Technology, Vol. 2, Issue 6, pp.479-
484, June 2013.
[4] Bishnu Prasad Das, Ranjan Parekh, โ€œRecognition of
Isolated Words using Features based on LPC, MFCC,
ZCR and STE with Neural Network Classifiers,โ€
International Journal of Modern Engineering
Research, Vol. 2, Issue 3, June 2012.
[5] Revathi, Y. Venkataramani, โ€œSpeaker Independent
Continuous Speech and Isolated Digit Recognition
using VQ and HMM,โ€ IEEE conference, pp. 198-
202, 2011.
[6] P. G. N. Priyadarshani, N. G. J. Dias, Amal
Punchihewa, โ€œDynamic Time Warping Based Speech
Recognition for Isolated Sinhala Words,โ€ IEEE
Journal, pp. 892-895, 2012.
[7] Santosh K. Gaikwad, Bharti W. Gawali, Pravin
Yannawar, โ€œA Review on Speech Recognition
Technique,โ€ International Journal of Computer
Applications, Vol. 10, Issue 3, Nov. 2010.
[8] David Kriesel, โ€œNeural Networks,โ€ research article.
[9] S. Karpagavali, P.V. Sabitha, โ€œIsolated Tamil Words
Speech Recognition using Linear Predictive Coding
and Networks,โ€ International Journal of Computer
Science and Management Research, Vol.1, Issue 5,
December 2012.
[10] Ahmad A. M. Abushariah, Teddy S. Gunawan,
Mohammad A. M. Abushariah, โ€œEnglish Digit
Speech Recognition System Based on Hidden
Markov Model,โ€ International Conference on
Computer and Communication Engineering, IEEE,
May 2010.
[11] Utpal Bhattacharjee, โ€œA Comparative Study of LPCC
and MFCC Features for the Recognition of Assamese
Phonemes,โ€ International Journal of Engineering
Research and Technology (IJERT), Vol.2, Issue 1,
January 2013.

More Related Content

What's hot

Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
Phan Duy
ย 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
niranjan kumar
ย 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
phyuhsan
ย 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
gt_ebuddy
ย 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
Vinodhini
ย 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
Vani011
ย 

What's hot (20)

Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
ย 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
ย 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
ย 
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech Analysis
ย 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
ย 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
ย 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
ย 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
ย 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGA
ย 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
ย 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
ย 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
ย 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
ย 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
ย 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
ย 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
ย 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
ย 
Speech Signal Analysis
Speech Signal AnalysisSpeech Signal Analysis
Speech Signal Analysis
ย 
Automatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAutomatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approach
ย 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
ย 

Similar to Isolated words recognition using mfcc, lpc and neural network

Similar to Isolated words recognition using mfcc, lpc and neural network (20)

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
ย 
Text independent speaker recognition using combined lpc and mfc coefficients
Text independent speaker recognition using combined lpc and mfc coefficientsText independent speaker recognition using combined lpc and mfc coefficients
Text independent speaker recognition using combined lpc and mfc coefficients
ย 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
ย 
D04812125
D04812125D04812125
D04812125
ย 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
ย 
Biomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptxBiomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptx
ย 
F EATURE S ELECTION USING F ISHER โ€™ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER โ€™ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER โ€™ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER โ€™ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
ย 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
ย 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
ย 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
ย 
Data detection with a progressive parallel ici canceller in mimo ofdm
Data detection with a progressive parallel ici canceller in mimo ofdmData detection with a progressive parallel ici canceller in mimo ofdm
Data detection with a progressive parallel ici canceller in mimo ofdm
ย 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
ย 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
ย 
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
ย 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
ย 
Performance evaluation on the basis of bit error rate for different order of ...
Performance evaluation on the basis of bit error rate for different order of ...Performance evaluation on the basis of bit error rate for different order of ...
Performance evaluation on the basis of bit error rate for different order of ...
ย 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...
ย 
Speech Compression Using Wavelets
Speech Compression Using Wavelets Speech Compression Using Wavelets
Speech Compression Using Wavelets
ย 
Peak detection using wavelet transform
Peak detection using wavelet transformPeak detection using wavelet transform
Peak detection using wavelet transform
ย 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
ย 

More from eSAT Journals

Material management in construction โ€“ a case study
Material management in construction โ€“ a case studyMaterial management in construction โ€“ a case study
Material management in construction โ€“ a case study
eSAT Journals
ย 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
eSAT Journals
ย 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
eSAT Journals
ย 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
eSAT Journals
ย 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
eSAT Journals
ย 

More from eSAT Journals (20)

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
ย 
Material management in construction โ€“ a case study
Material management in construction โ€“ a case studyMaterial management in construction โ€“ a case study
Material management in construction โ€“ a case study
ย 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case study
ย 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
ย 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
ย 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
ย 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
ย 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
ย 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
ย 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
ย 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
ย 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
ย 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
ย 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
ย 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
ย 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
ย 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
ย 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
ย 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
ย 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
ย 

Recently uploaded

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
ย 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
ย 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
ย 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
ย 

Recently uploaded (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
ย 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
ย 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
ย 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
ย 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
ย 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
ย 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
ย 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
ย 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
ย 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
ย 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
ย 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
ย 
Top Rated Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
ย 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
ย 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
ย 

Isolated words recognition using mfcc, lpc and neural network

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 146 ISOLATED WORDS RECOGNITION USING MFCC, LPC AND NEURAL NETWORK Mayur R Gamit1 , Kinnal Dhameliya2 1 M.Tech student, E&C Department, C.G.P.I.T, Bardoli, India 2 Assistant Professor, E&C Department, C.G.P.I.T, Bardoli, India Abstract Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach using Neural Network as a classifier.. Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial Neural Network (ANN). --------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION Automatic speech recognition (ASR) has been the most investigated topic in speech processing since early 1960s [1]. Speech recognition is a popular and active area of research, used to translate words spoken by humans so as to make them computer recognizable. It usually involves extraction of features from speech signal and representing them using an appropriate data model. ASR system involves two phases. Training phase and Testing phase. In training phase, known speech is recorded and parametric representation of the speech is extracted and stored in the speech database. In the testing phase, for the given input speech signal the features are extracted and ASR system compares it with the reference templates to recognize the utterance. This paper proposes an approach to recognize isolated words using Mel frequency Cepstral coefficient (MFCC) and Linear Predictive Coding (LPC) as a feature extraction techniques and Neural Network as a classification technique. This paper evaluates the recognition accuracy by using only MFCC and combination features of MFCC and LPC. The MFCC gives higher recognition accuracy in speech recognition systems as compared to other techniques. The main advantage of using MFCC techniques is because of less complexity and accurate results while LPC is suitable for speaker recognition. The Back-propagation neural network is used as a classifier. The human voice is recorded and digitized. Then it is input to the pre-processing block. The main task of this block is to separate out the unvoiced speech samples from the voiced speech samples. The detection of voiced and unvoiced speech can be done on the basis of calculating energy and zero crossing rates frame by frame basis. Framing is necessary to analyze the speech. Speech is quasi-periodic signal, so analysis can be done by fragmenting the speech into small segments called chunks or frames. The frame work of speech recognition is shown below: Fig. 1 Block diagram of speech recognition system. This paper proposes a novel and efficient way to recognize isolated words by using advanced pre-processing strategy. Section 2, introduces a new pre-processing concept. Sections 3, describes the feature extraction techniques MFCC and combine features of both MFCC and LPC. Section 4, describes the neural network and section 5, describes the experimental results. 2. PRE-PROCESSING CONCEPT The recorded speech is feed to the preprocessing block. The speech is segmented into small chunks or frames. The determination of voiced and unvoiced speech samples can be done on the basis of Energy and Zero crossing rates (ZCR) [2].
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 147 2.1 Energy The speech signal is fragmented into small frames. Each frame is of ฯ‰ samples, where ฯ‰ ห‚ n as n is total number of samples. The energy of speech is calculated frame by frames. The square of each sample is done and finally summation of all squared samples is done. The equation to calculate energy is given below: Energy = ๐‘ฅ๐‘– 2๐œ” ๐‘–=1 (1) 2.2 Zero Crossing Rate (ZCR) The zero crossing rates means the number of times the transition of speech signal from positive to negative or vice- versa. The zero crossing rates are calculated frame by frames. If the zcr of speech samples having more zero crossing rates then it is a unvoiced otherwise voiced. According to the survey done by rabiner the ZCR of fricative speech samples are more than threshold then consider it as a voiced speech. Fricative has more zero crossing rates as compared to unvoiced speech. The equation to calculate the ZCR is as follows: ZCR= ๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ ๐‘– โˆ’๐‘ ๐‘–๐‘”๐‘› (๐‘ฅ ๐‘–โˆ’1) 2 ๐œ” ๐‘–=1 (2) 2.3 Start Point and End Point Detection The calculation of the energy and zero crossing rates can be done on the basis of frames. The start point and end point can be accurately found out on the basis of both energy and zero crossing rates. First the start and end points can be determined based on energy only. Based on zcr of the start and end points are computed. If the zcr of the frame having start point is more than the start point is shifted towards left side and same as end point shifted towards right. ๏‚ท If ZCR > (threshold) then start point is shifted towards left otherwise remains same. ๏‚ท If ZCR > (threshold) then end point is shifted towards right otherwise remains same. 2.4 Removal of Unvoiced Parts between Start Point and End Point The recognition accuracy can be increased when the unvoiced part between the start and end points can be removed. There are certain samples of speech in between start and end points which have minimum energy. They do not contain any information, so by removing them makes recognition accuracy better. This advance concept has been used in this speech recognition system. 3. FEATURE EXTRACTION In feature extraction the Mel frequency Cepstral coefficient (MFCC) and combine features of both MFCC and LPC are used. The both techniques are described below: 3.1 Mel-Frequency Cepstrum Coefficients (MFCC) MFCC stands for Mel Frequency Cepstral Coefficient. MFCC is one of the most commonly used feature extraction method in speech recognition [3], [10]. Step1: Pre-emphasis The signal is passed through a filter which emphasis a high frequencies [6]. This process increases the energy of signal at high frequency. High frequency also contains information. The equation used to denotes the pre-emphasis is shown below: ๐‘† ๐‘› = ๐‘‹ ๐‘› โˆ’ ๐‘Ž โˆ— ๐‘‹ ๐‘› โˆ’ 1 (3) Where s(n) denotes the output sample, x(n) is present sample, x(n-1) is past sample and value of a is between 0.95 to 1. Step 2: Framing and Overlapping The speech signal is split into several frames such that each frame can be examined in the short time instead of the entire signal. The frame size is of the range 20-40 ms. Then overlapping is applied to frames, hamming window is applied. The equation of hamming window is as follows: ๐‘† ๐‘› = ๐‘‹ ๐‘› โˆ— ๐‘Š ๐‘› 4 ๐‘Š ๐‘› = 0.54 โˆ’ 0.46 cos 2๐œ‹๐‘› ๐‘ โˆ’ 1 0 โ‰ค ๐‘› โ‰ค ๐‘ โˆ’ 1 (5) The overall process is shown in figure below: Fig. 2 Block diagram of MFCC. Step 3: Fast Fourier Transform The Fast Fourier Transform (FFT) converts the frames from time domain to frequency domain. The conversion is done from time to frequency domain because the information is more in frequency domain. Therefore, FFT is executed to obtain the magnitude frequency response of each frame and to prepare the signal for the next stage. ๐‘† ๐œ” = ๐‘“๐‘“๐‘ก ๐‘‹ ๐‘› (6)
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 148 Step 4: Mel Filter bank Human ear perception of frequency contents of sounds for speech signal does not follow a linear scale. Therefore, for each tone with an actual frequency f, measured in Hz, a subjective pitch is measured on a scale called the โ€œmel scaleโ€ [9]. The Mel frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000Hz [5]. To compute the Mel for a given frequency f in Hz, a following formula is used. ๐น ๐‘š๐‘’๐‘™ = 2595 โˆ— ๐‘™๐‘œ๐‘”10 1 + ๐‘“ 700 (7) Each filterโ€™s magnitude frequency response is triangular in shape and equal to unity at the centre frequency and decrease linearly to zero at centre frequency of two adjacent filters. Step 5: Log and IDCT The output of Mel filter bank is given to log. This is the process to convert the log Mel spectrum into time domain using Inverse Discrete Cosine Transform (IDCT). Step 6: Energy The energy of all frames after IDCT is calculated. The result of the conversion is called Mel Frequency Cepstrum Coefficient. The set of coefficient is called acoustic vectors. Therefore, each input utterance is transformed into a sequence of acoustic vector. 3.2 Combine Features of MFCC and Linear Predictive Coding (LPC) The basic idea behind the Linear Predictive Coding (LPC) analysis is that a speech sample can be approximated as linear combination of past speech samples. The LPC provides a robust, reliable and accurate method for estimating the parameters that represent the vocal tract system. The autocorrelation analysis is done. Levinson Durbinโ€™s algorithm is used to analyze the LP model [11]. The LPC gives the best results for the speaker recognition rather than the speech recognition. Here, 10th order LPC is taken means 11 coefficients of each frames can be obtained. The 12 coefficients of MFCC and 11 coefficients of LPC are combined frame by frame basis to yields 23 coefficients of each frames. 4. ARTIFICIAL NEURAL NETWORK Classifier used in this speech recognition is Back Propagation Neural Network (BPNN) [8]. The back- propagation architecture takes input nodes as features based on the coefficients of MFCC and combine of both features MFCC and LPC. The hidden layer consists of 299 hidden nodes. The output layer consists of 280 nodes. The output nodes from 1-28 belongs to โ€œzeroโ€, 29-56 belongs to โ€œoneโ€, 57-84 belongs to โ€œtwoโ€, 85-112 belongs to โ€œthreeโ€, 113-140 belongs to โ€œfourโ€, 141-168 belongs to โ€œfiveโ€, 169-196 belongs to โ€œsixโ€, 197-224 belongs to โ€œsevenโ€, 225-252 belongs to โ€œeightโ€, 253-280 belongs to โ€œnineโ€. 5. PERFORMANCE AND RESULTS The performance of the speech recognition system is often described in terms of accuracy. Table 1 indicates the recognition accuracy by implementation of the proposed algorithm on a dataset consists of 28 speakers of which 14 are males and 14 are females. The obtained accuracy is compared with [4]. The both features extraction techniques which are described above are used and back propagation is used as a classifiers. The results are compared with the neural network used in [4]. The recording of speech has been done in a controlled environment of laboratory using a noise removing headphone. The measuring of recognition accuracy (RA) is done based on the equation ๐‘…๐ด = ๐‘๐‘œ. ๐‘œ๐‘“ ๐‘ก๐‘–๐‘š๐‘’๐‘  ๐‘ค๐‘œ๐‘Ÿ๐‘‘ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘”๐‘›๐‘–๐‘ง๐‘’๐‘‘ ๐‘‡๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘œ. ๐‘œ๐‘“ ๐‘ก๐‘Ÿ๐‘–๐‘Ž๐‘™๐‘  โˆ— 100 (8) The table 1 shows the recognition accuracy obtained during each session. The Mean square error (MSE) is 0.002482 at 10,000 epochs as compared to MSE of [4] is 0.005 at 1097 epochs for convergence. Table 1: Recognition accuracy using MFCC and Neural Network Number of sessions Recognition Accuracy 1 88% 2 82% 3 77% Average 82.33% The table below shows the recognition accuracy by using combine features of MFCC and LPC and back-propagation neural network as classifiers. The 12 coefficients of MFCC and 11 coefficients of LPC are combine frames by frames. The accuracy at each session is shown below: Table 2: Recognition accuracy using MFCC+LPC and Neural Network Number of sessions Recognition Accuracy 1 97% 2 78% 3 85% Average 86.66% The table below shows the comparative results of the obtained recognition accuracy with the [4]. The proposed approach gives the more recognition accuracy than described in [4].
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 149 Table 3: Comparative results [4] Classifi er Only MFC C MFCC+L PC +STE+ZC R Only MFCC (Propos ed) MFCC+LPC + STE+ZCR (Proposed) Neural Networ k 51.25 % 85% 82.33% 86.66% 6. CONCLUSION The experimental results shows that by using the proposed MFCC and combination of both MFCC and LPC feature extraction techniques the results are higher as compared to [4]. The recognition accuracy is higher by using Back- propagation instead of using MLP as shown in [4]. The recognition accuracy may differ by using only MFCC, LPC and combination of both MFCC and LPC techniques as well as other classification techniques. REFERENCES [1] Xian Tang, โ€œHybrid Hidden Markov Model and Artificial Neural Network for Automatic Speech Recognition, โ€œPacific-Asia Conference on Circuits, Communication and System, IEEE Computer society, 2009. [2] Nidhi Desai, Prof. Kinnal Dhameliya, โ€œFeature Extraction and Classification Techniques for Speech Recognition: A Review,โ€ International Journal of Emerging Technology and Advanced Engineering, Vol. 3, Issue 12, December 2013. [3] Shanthi Therese S, Chelpa Lingam, โ€œReview of Feature Extraction Techniques in Automatic Speech Recognition,โ€ International Journal of Scientific Engineering and Technology, Vol. 2, Issue 6, pp.479- 484, June 2013. [4] Bishnu Prasad Das, Ranjan Parekh, โ€œRecognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE with Neural Network Classifiers,โ€ International Journal of Modern Engineering Research, Vol. 2, Issue 3, June 2012. [5] Revathi, Y. Venkataramani, โ€œSpeaker Independent Continuous Speech and Isolated Digit Recognition using VQ and HMM,โ€ IEEE conference, pp. 198- 202, 2011. [6] P. G. N. Priyadarshani, N. G. J. Dias, Amal Punchihewa, โ€œDynamic Time Warping Based Speech Recognition for Isolated Sinhala Words,โ€ IEEE Journal, pp. 892-895, 2012. [7] Santosh K. Gaikwad, Bharti W. Gawali, Pravin Yannawar, โ€œA Review on Speech Recognition Technique,โ€ International Journal of Computer Applications, Vol. 10, Issue 3, Nov. 2010. [8] David Kriesel, โ€œNeural Networks,โ€ research article. [9] S. Karpagavali, P.V. Sabitha, โ€œIsolated Tamil Words Speech Recognition using Linear Predictive Coding and Networks,โ€ International Journal of Computer Science and Management Research, Vol.1, Issue 5, December 2012. [10] Ahmad A. M. Abushariah, Teddy S. Gunawan, Mohammad A. M. Abushariah, โ€œEnglish Digit Speech Recognition System Based on Hidden Markov Model,โ€ International Conference on Computer and Communication Engineering, IEEE, May 2010. [11] Utpal Bhattacharjee, โ€œA Comparative Study of LPCC and MFCC Features for the Recognition of Assamese Phonemes,โ€ International Journal of Engineering Research and Technology (IJERT), Vol.2, Issue 1, January 2013.