SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719

RESEARCH ARTICLE

www.ijera.com

OPEN ACCESS

Performance Evaluation of Human Voice Recognition System
based on MFCC feature and HMM classifier
Mr. Anand Mantri*, Mr. Mukesh Tiwari**, Mr. Jaikaran Singh***
*PG Student (Department of Electronics and Communication, SSSIST, Sehore, India)
** (Department of Electronics and Communication, SSSIST, Sehore, India)
*** (Department of Electronics and Communication, SSSIST, Sehore, India)

ABSTRACT
The speech processing is very exciting field in research area. The human voice recognition is one of the
applications of this field. There are various techniques which are evaluated by scientist for voice recognition and
available in daily use. In this paper we will design and develop MFCC (Mel-Frequency Cepstral Coefficients)
feature and HMM classifier based human recognition system, which will identify the human based on their
voice input. The performance of recognition system is evaluated based on recognition rate, which is given in this
paper for different data set of human voice.
Keywords - Voice Recognition, MFCC (Mel-Frequency Cepstral Coefficients), HMM (Hidden Markov
Model), VAD (Voice Activity Detection), LPC.

I.

INTRODUCTION

Voice is the sound produced by humans and other
vertebrates using the lungs and the vocal folds in the
larynx, or voice box. When a human speaks, his or
her voice changes in power and tone during the
utterance. A graph of the voice signal shows these
changes in the height of the wave during very small
changes in time.

Fig 1: Voice Signal

Automatic recognition of speech by machine has
been a goal of research for more than four decades
[1, 2]. Generally speaking, there are three
approaches to voice recognition: the acoustic
phonetic approach, the pattern recognition approach
and the artificial intelligence approach. One wellknown and widely used pattern-recognition
approach to voice recognition is Hidden Markov
Model (HMM) approach, which is a statistical
method of characterizing the spectral properties of
the frames of a pattern. This provides a natural and
highly reliable way of recognizing speech for a wide
range of applications.
www.ijera.com

The earliest attempts to devise systems for
automatic voice recognition by machine were made
in the 1950’s, when various researchers tried to
exploit the fundamental ideas of acoustic-phonetics.
In 1952, at Bell Laboratories, Davis, built a system
for isolated digit recognition for a single speaker
[12]. The system relied heavily on measuring
spectral resonances during the vowel region of each
digit. In 1959 another attempt was made, at MIT
Lincoln Laboratories. Ten vowels embedded in a/b/vowel-/t/ format were recognized in a speaker
independent manner [10]. Mainly voice recognition
began as early as the 1960’s with exploration into
voiceprint analysis, where characteristics of an
individual’s voice were thought to be able to
characterize the uniqueness of an individual much
like a fingerprint. The early systems had many flaws
and research ensued to derive a more reliable
method of predicting the correlation between two
sets of voice utterances. Speaker identification
research continues today under the realm of the field
of digital signal processing where many advances
have taken place in recent years. The performance
of the voice recognition systems is given in terms of
a word error rate (%) as measured for a specified
technology, for a given task, with specified task
syntax, in a specified mode, and for a specified word
vocabulary.
In the 1970’s voice recognition research achieved a
number of significant milestones. First the area of
isolated word or discrete utterance recognition
became a viable and usable technology based on the
fundamental studies in Russia [9], Japan [10] and
United States [5]. The Russian studies helped
advance the use of pattern recognition ideas in
speech recognition; the Japanese research showed
715 | P a g e
Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719
how dynamic programming methods could be
successfully applied; and United States research
showed how the ideas of linear predicting
coding(LPC). At AT&T Bell Labs, began a series of
experiments aimed at making speech recognition
systems that were truly speaker independent [03].
They used a wide range of sophisticated clustering
algorithms to determine the number of distinct
patterns required to represent all variations of
different words across a wide user population. In the
1980’s a shift in technology from template-based
approaches to statistical modeling methods,
especially the hidden Markov model (HMM)
approach [1].
The main goal of this work is to create a device that
could recognize one’s voice as a unique biometric
signal and compare it against a database to choose
the person’s identity or deny an unregistered person
while being as standalone as possible. A human can
easily recognize a familiar voice however; getting a
computer to distinguish a particular voice among
others is a more difficult task. Immediately, several
problems arise when trying to write a voice
recognition algorithm. The majority of these
difficulties are due to the fact that it is almost
impossible to say a word exactly the same way on
two different occasions. Some factors that
continuously change in human speech are how fast
the word is spoken, emphasizing different parts of
the word, etc… In order to analyze two sound files
in time domain, the recordings would have to be
aligned just right so that both recordings would
begin at precisely the same moment.

II.

www.ijera.com

Template matching is the simplest technique and has
the highest accuracy when used properly, but it also
suffers from the most limitations. As with any
approach to voice recognition, the first step is for
the user to speak a word or phrase into a
microphone. The electrical signal from the
microphone is digitized by an "analog-to-digital
(A/D) converter", and is stored in memory. To
determine the "meaning" of this voice input, the
computer attempts to match the input with a
digitized voice sample, or template that has a known
meaning. The program contains the input template,
and attempts to match this template with the actual
input using a simple conditional statement. Since
each person's voice is different, the program cannot
possibly contain a template for each potential user,
so the program must first be "trained" with a new
user's voice input before that user's voice can be
recognized by the program. A more general form of
voice recognition is available through feature
analysis (MFCC) and this technique usually leads to
“speaker-independent"
voice
recognition.
Recognition accuracy for speaker-independent
systems is usually between 90 and 95 percent.
One more method is used for voice reorganization,
which is based on HMM (Hidden Markov Model)
algorithm.

VOICE RECOGNITION

Voice recognition is "the technology by
which sounds, words or phrases spoken by humans
are converted into electrical signals, and these
signals are transformed into coding patterns to
which meaning has been assigned". We focus here
on the human voice because we most often and most
naturally use our voices to communicate our ideas to
others in our immediate surroundings, so voice
recognition is aimed toward identifying the person
who is speaking .voice recognition works by
analyzing the features of speech that differ between
individuals. Everyone has a unique pattern of voice
stemming from their anatomy (the size and shape of
the mouth and throat) and behavioral patterns (their
voice’s pitch, their speaking style, accent, and so
on).The applications of voice recognition are
markedly different from those of voice recognition.
Most commonly, voice recognition technology is
used to verify a human’s identity or determine an
unknown person identity. Human verification and
human identification are both common types of
voice recognition. The most common approaches to
voice recognition can be divided into two classes:
"template matching" and "feature analysis".
www.ijera.com

Fig 2: Voice recognition process using feature extraction.

Voice recognition technologies based on Hidden
Markov
Model
(HMM)
have
developed
considerably and can provide high recognition
accuracy. HMM is a statistical modeling approach
and is defined by three sets of probabilities: the
initial state probability, the state transition
probability matrix, and the output probability
matrix. The computation cost of a typical HMMbased speech recognition algorithm is very high,
which depends on the number of states for each
word and all words, the number of Gaussian
mixtures, the number of speech frames, the number
of features for each speech frame and the size of the
vocabulary.
In this paper, HMM classifier which will accept the
MFCC feature of voice as an input and pass through
different model to generate the output. The
description of the method is subsequently given in
this paper.

716 | P a g e
Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719
III.

SYSTEM DESCRIPTION

Hidden Markov Model filter receive two
inputs. One is sample voice database and other is
real time input voice. To create data base as HMM
model first human voice sample is taken, and then
Voice Activity Detection (VAD) separate actual
date from the samples. MFCC is based on the shortterm analysis, and thus from each frame a MFCC
vector is computed. In order to extract the
coefficients the speech sample is taken as the input
and hamming window is applied to minimize the
discontinuities of a signal. Then DFT will be used to
generate the Mel filter bank. According to Mel
frequency warping, the width of the triangular filters
varies and so the log total energy in a critical band
around the center frequency is included. After
warping the numbers of coefficients are obtained.
Finally the
Inverse Discrete Fourier Transformer is used for the
cepstral coefficients calculation. Out of this
extracted MFCC, HMM is generated which is also
training phase for filter which models the given
problem as a “doubly stochastic process” in which
the observed data are thought to be the result of
having passed the “true” (hidden) process and that is
how database model is created . Same process of
MFCC extraction for real time input voice is
performed. HMM filter compare this two hmm
values and short out best match between input voice
and database. And thus human voice is recognized.
.

www.ijera.com

in MFCC, it approximates the human system
response more closely than any other system.
Technique of computing MFCC is based on the
short-term analysis, and thus from each frame a
MFCC vector is computed. In order to extract the
coefficients the speech sample is taken as the input
and hamming window is applied to minimize the
discontinuities of a signal. Then DFT will be used to
generate the Mel filter bank. According to Mel
frequency warping, the width of the triangular filters
varies and so the log total energy in a critical band
around the center frequency is included. After
warping the numbers of coefficients are obtained.
Finally the Inverse Discrete Fourier Transformer is
used for the cepstral coefficients calculation [8] [9].
It transforms the log of the quefrench domain
coefficients to the frequency domain where N is the
length of the DFT. MFCC can be computed by Mel
(f) = 2595*log10 (1+f/700). The following figure
shows the steps involved in MFCC feature
extraction.

Figure 4: Steps involved in MFCC feature Extraction

3.3 HIDDEN MARKOV MODEL (HMM)

Fig 3: Training and Testing of HMM

3.1 VOICE ACTIVITY DETECTION
Voice Activity Detection (VAD) determines which
parts of a voice signal are actual data and which are
silence. The VAD algorithm used here utilizes the
short-time energy, and zero crossing rates to decide
if there is voice activity. Mel-Frequency Cepstral
Coefficients (MFCC) was used to extract
characteristic information from the speech vectors.
3.2 MEL-FREQUENCY CEPSTRAL COEFFICIENTS
The MFCC is the most evident example of a feature
set that is extensively used in voice recognition. As
the frequency bands are positioned logarithmically
www.ijera.com

The HMM is a stochastic approach which models
the given problem as a “doubly stochastic process”
in which the observed data are thought to be the
result of having passed the “true” (hidden) process
through a second process. Both processes are to be
characterized using only the one that could be
observed. The problem with this approach is that
one do not know anything about the Markov chains
that generate the speech. The number of states in the
model is unknown, there probabilistic functions are
unknown and one cannot tell from which state an
observation was produced. These properties are
hidden, and thereby the name hidden Markov
model. In simpler Markov models (like a Markov
chain), the state is directly visible to the observer,
and therefore the state transition probabilities are the
only parameters. In a hidden Markov model, the
state is not directly visible, but output, dependent on
the state, is visible. Each state has a probability
distribution over the possible output tokens.
Therefore the sequence of tokens generated by an
HMM gives some information about the sequence of
states.
717 | P a g e
Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719
IV.

www.ijera.com

IMPLEMENTATION & RESULTS

Fig 5: Recoded Voice
Fig 7: HMM coefficients after training

(a)

(a)

(b)
Fig 6: (a) Input Voice samples, (b) MFCC samples

(b)
Fig 8: (a) and (b) Final Output with Voice no

www.ijera.com

718 | P a g e
Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719
Table1. Recognition rate with different input voice
samples

S. No.

No. of training
voice

Recognition
rate

1

8

0.875

2

16

0.937

3

24

0.958

V. CONCLUSION
Voice Recognition technology is relate to
taking the human voice and converting it into words,
commands, or a variety of interactive applications.
In addition, voice recognition takes this application
one step further by using it to verify, identity, and
understand basic commands. These technologies
will play a greater role in the future and even
threaten to make the keyboard obsolete. Initially
looking at the experiment, the plan was to have a
text-dependent system, or a speaker verification
system, or something that could actually determine
what word was being spoken to the system by the
user. It has become painfully clear that that would
be a very difficult task to accomplish, and would
require much more time, effort. Our system is,
relatively successful –it identified speakers at a rate
of almost 80-95% - a very good recognition rate for
a basic system.

REFERENCES
[1]

[2]

[3]

[4]

[5]

M. Yuan, T. Lee, P. C. Ching, and Y. Zhu,
“Speech
recognition on DSP: Issues on
computational efficiency and performance
analysis,” in Proc. IEEE ICCCAS, 2005.
B. Burchard, R. Roemer, and O. Fox, “A
single chip phoneme based. HMM speech
recognition
system
for
consumer
applications,” IEEE Trans. Consumer
Electron., vol. 46, no. 3, Aug. 2000.
U.
C.
Pazhayaveetil,“Hardware
implementation of a low power speech
recognition system,” Ph.D. dissertation,
Dept. Elect. Eng., North Car-olina State
Univ., Raleigh, NC, 2007.
Chadawan I., Siwat S. and Thaweesak Y.,”
Speech
Recognition
using
MFCC”.International
Conference
on
Computer Graphics, Simulation and
Modeling (ICGSM'2012)July 28-29, 2012
Pattaya (Thailand)
W. Han, K. Hon, Ch. Chan, T. Lee, Ch.
Choy, K. Pun, and P. C. Ching, “An HMMbased speech recognition IC,” in Proc. IEEE
ISCAS, 2003,

www.ijera.com

www.ijera.com

P. Li and H. Tang, “Design a co-processor
for output probability Calculation in speech
recognition,” in Proc. IEEE ISCAS, 2009,
[7] Jia-Ching Wang, Jhing-Fa Wang*, YuSheng Weng “Chipdesign of
MFCC
extraction
for
speech
recognition”
INTEGRATION, the VLSI journal 32
(2002)
[8] Corneliu Octavian DUMITRU, Inge
GAVAT, “A Comparative Study of Feature
Extraction Methods Applied to Continuous
Speech
Recognition
in
Romanian
Language”, 48th International Symposium
ELMAR-2006, 07-09 June 2006, Zadar,
Croatia
[9] DOUGLAS
O’SHAUGHNESSY,
“Interacting
With Computers by Voice:
Automatic
Speech
Recognition and
Synthesis”, Proceedings of the IEEE, VOL.
91, NO. 9, September 2003,
[10] Mahdi Shaneh, and Azizollah Taheri
“Voice Command Recognition System
Based
on
MFCC
and
VQ
Algorithms“World Academy of Science,
Engineering and Technology
[11] Lindasalwa Muda, Mumtaj Begam and I.
Elamvazuthi”Voice Recognition Algorithms
using Mel Frequency Cepstral Coefficient
(MFCC) and Dynamic Time Warping
(DTW) Techniques” JOURNAL OF
COMPUTING, VOLUME 2, ISSUE 3,
MARCH 2010, ISSN 2151-9617
[12] Kashyap Patel, R.K.Prasad
.”Speech
Recognition and Verification Using MFCC
&VQ” International Journal of Emerging
Science and Engineering(IJESE) ISSN:
2319–6378,Volume-1, Issue-7, May 2013
[6]

719 | P a g e

Mais conteúdo relacionado

Mais procurados

Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Forensic and Automatic Speaker Recognition System
Forensic and Automatic Speaker Recognition System Forensic and Automatic Speaker Recognition System
Forensic and Automatic Speaker Recognition System IJECEIAES
 
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET Journal
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesHarshad Karmarkar
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition Systemijtsrd
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01ijcsit
 
IRJET - Sign Language Converter
IRJET -  	  Sign Language ConverterIRJET -  	  Sign Language Converter
IRJET - Sign Language ConverterIRJET Journal
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarWithTheBest
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 

Mais procurados (15)

Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Forensic and Automatic Speaker Recognition System
Forensic and Automatic Speaker Recognition System Forensic and Automatic Speaker Recognition System
Forensic and Automatic Speaker Recognition System
 
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devices
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speech
 
52 57
52 5752 57
52 57
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 
IRJET - Sign Language Converter
IRJET -  	  Sign Language ConverterIRJET -  	  Sign Language Converter
IRJET - Sign Language Converter
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 

Destaque

Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitudeWith Company
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer ExperienceYuan Wang
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 

Destaque (7)

Seguranca e producao
Seguranca e producaoSeguranca e producao
Seguranca e producao
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 

Semelhante a De4201715719

A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
 
AN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMAN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMcseij
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptxJhalakDashora
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
 
A Robust Speaker Identification System
A Robust Speaker Identification SystemA Robust Speaker Identification System
A Robust Speaker Identification Systemijtsrd
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionIRJET Journal
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)TANVIRAHMED611926
 
B tech project_report
B tech project_reportB tech project_report
B tech project_reportabhiuaikey
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specificationijtsrd
 

Semelhante a De4201715719 (20)

A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification Techniques
 
AN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMAN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEM
 
10
1010
10
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...
 
A Robust Speaker Identification System
A Robust Speaker Identification SystemA Robust Speaker Identification System
A Robust Speaker Identification System
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech Recognition
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
B tech project_report
B tech project_reportB tech project_report
B tech project_report
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specification
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

De4201715719

  • 1. Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719 RESEARCH ARTICLE www.ijera.com OPEN ACCESS Performance Evaluation of Human Voice Recognition System based on MFCC feature and HMM classifier Mr. Anand Mantri*, Mr. Mukesh Tiwari**, Mr. Jaikaran Singh*** *PG Student (Department of Electronics and Communication, SSSIST, Sehore, India) ** (Department of Electronics and Communication, SSSIST, Sehore, India) *** (Department of Electronics and Communication, SSSIST, Sehore, India) ABSTRACT The speech processing is very exciting field in research area. The human voice recognition is one of the applications of this field. There are various techniques which are evaluated by scientist for voice recognition and available in daily use. In this paper we will design and develop MFCC (Mel-Frequency Cepstral Coefficients) feature and HMM classifier based human recognition system, which will identify the human based on their voice input. The performance of recognition system is evaluated based on recognition rate, which is given in this paper for different data set of human voice. Keywords - Voice Recognition, MFCC (Mel-Frequency Cepstral Coefficients), HMM (Hidden Markov Model), VAD (Voice Activity Detection), LPC. I. INTRODUCTION Voice is the sound produced by humans and other vertebrates using the lungs and the vocal folds in the larynx, or voice box. When a human speaks, his or her voice changes in power and tone during the utterance. A graph of the voice signal shows these changes in the height of the wave during very small changes in time. Fig 1: Voice Signal Automatic recognition of speech by machine has been a goal of research for more than four decades [1, 2]. Generally speaking, there are three approaches to voice recognition: the acoustic phonetic approach, the pattern recognition approach and the artificial intelligence approach. One wellknown and widely used pattern-recognition approach to voice recognition is Hidden Markov Model (HMM) approach, which is a statistical method of characterizing the spectral properties of the frames of a pattern. This provides a natural and highly reliable way of recognizing speech for a wide range of applications. www.ijera.com The earliest attempts to devise systems for automatic voice recognition by machine were made in the 1950’s, when various researchers tried to exploit the fundamental ideas of acoustic-phonetics. In 1952, at Bell Laboratories, Davis, built a system for isolated digit recognition for a single speaker [12]. The system relied heavily on measuring spectral resonances during the vowel region of each digit. In 1959 another attempt was made, at MIT Lincoln Laboratories. Ten vowels embedded in a/b/vowel-/t/ format were recognized in a speaker independent manner [10]. Mainly voice recognition began as early as the 1960’s with exploration into voiceprint analysis, where characteristics of an individual’s voice were thought to be able to characterize the uniqueness of an individual much like a fingerprint. The early systems had many flaws and research ensued to derive a more reliable method of predicting the correlation between two sets of voice utterances. Speaker identification research continues today under the realm of the field of digital signal processing where many advances have taken place in recent years. The performance of the voice recognition systems is given in terms of a word error rate (%) as measured for a specified technology, for a given task, with specified task syntax, in a specified mode, and for a specified word vocabulary. In the 1970’s voice recognition research achieved a number of significant milestones. First the area of isolated word or discrete utterance recognition became a viable and usable technology based on the fundamental studies in Russia [9], Japan [10] and United States [5]. The Russian studies helped advance the use of pattern recognition ideas in speech recognition; the Japanese research showed 715 | P a g e
  • 2. Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719 how dynamic programming methods could be successfully applied; and United States research showed how the ideas of linear predicting coding(LPC). At AT&T Bell Labs, began a series of experiments aimed at making speech recognition systems that were truly speaker independent [03]. They used a wide range of sophisticated clustering algorithms to determine the number of distinct patterns required to represent all variations of different words across a wide user population. In the 1980’s a shift in technology from template-based approaches to statistical modeling methods, especially the hidden Markov model (HMM) approach [1]. The main goal of this work is to create a device that could recognize one’s voice as a unique biometric signal and compare it against a database to choose the person’s identity or deny an unregistered person while being as standalone as possible. A human can easily recognize a familiar voice however; getting a computer to distinguish a particular voice among others is a more difficult task. Immediately, several problems arise when trying to write a voice recognition algorithm. The majority of these difficulties are due to the fact that it is almost impossible to say a word exactly the same way on two different occasions. Some factors that continuously change in human speech are how fast the word is spoken, emphasizing different parts of the word, etc… In order to analyze two sound files in time domain, the recordings would have to be aligned just right so that both recordings would begin at precisely the same moment. II. www.ijera.com Template matching is the simplest technique and has the highest accuracy when used properly, but it also suffers from the most limitations. As with any approach to voice recognition, the first step is for the user to speak a word or phrase into a microphone. The electrical signal from the microphone is digitized by an "analog-to-digital (A/D) converter", and is stored in memory. To determine the "meaning" of this voice input, the computer attempts to match the input with a digitized voice sample, or template that has a known meaning. The program contains the input template, and attempts to match this template with the actual input using a simple conditional statement. Since each person's voice is different, the program cannot possibly contain a template for each potential user, so the program must first be "trained" with a new user's voice input before that user's voice can be recognized by the program. A more general form of voice recognition is available through feature analysis (MFCC) and this technique usually leads to “speaker-independent" voice recognition. Recognition accuracy for speaker-independent systems is usually between 90 and 95 percent. One more method is used for voice reorganization, which is based on HMM (Hidden Markov Model) algorithm. VOICE RECOGNITION Voice recognition is "the technology by which sounds, words or phrases spoken by humans are converted into electrical signals, and these signals are transformed into coding patterns to which meaning has been assigned". We focus here on the human voice because we most often and most naturally use our voices to communicate our ideas to others in our immediate surroundings, so voice recognition is aimed toward identifying the person who is speaking .voice recognition works by analyzing the features of speech that differ between individuals. Everyone has a unique pattern of voice stemming from their anatomy (the size and shape of the mouth and throat) and behavioral patterns (their voice’s pitch, their speaking style, accent, and so on).The applications of voice recognition are markedly different from those of voice recognition. Most commonly, voice recognition technology is used to verify a human’s identity or determine an unknown person identity. Human verification and human identification are both common types of voice recognition. The most common approaches to voice recognition can be divided into two classes: "template matching" and "feature analysis". www.ijera.com Fig 2: Voice recognition process using feature extraction. Voice recognition technologies based on Hidden Markov Model (HMM) have developed considerably and can provide high recognition accuracy. HMM is a statistical modeling approach and is defined by three sets of probabilities: the initial state probability, the state transition probability matrix, and the output probability matrix. The computation cost of a typical HMMbased speech recognition algorithm is very high, which depends on the number of states for each word and all words, the number of Gaussian mixtures, the number of speech frames, the number of features for each speech frame and the size of the vocabulary. In this paper, HMM classifier which will accept the MFCC feature of voice as an input and pass through different model to generate the output. The description of the method is subsequently given in this paper. 716 | P a g e
  • 3. Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719 III. SYSTEM DESCRIPTION Hidden Markov Model filter receive two inputs. One is sample voice database and other is real time input voice. To create data base as HMM model first human voice sample is taken, and then Voice Activity Detection (VAD) separate actual date from the samples. MFCC is based on the shortterm analysis, and thus from each frame a MFCC vector is computed. In order to extract the coefficients the speech sample is taken as the input and hamming window is applied to minimize the discontinuities of a signal. Then DFT will be used to generate the Mel filter bank. According to Mel frequency warping, the width of the triangular filters varies and so the log total energy in a critical band around the center frequency is included. After warping the numbers of coefficients are obtained. Finally the Inverse Discrete Fourier Transformer is used for the cepstral coefficients calculation. Out of this extracted MFCC, HMM is generated which is also training phase for filter which models the given problem as a “doubly stochastic process” in which the observed data are thought to be the result of having passed the “true” (hidden) process and that is how database model is created . Same process of MFCC extraction for real time input voice is performed. HMM filter compare this two hmm values and short out best match between input voice and database. And thus human voice is recognized. . www.ijera.com in MFCC, it approximates the human system response more closely than any other system. Technique of computing MFCC is based on the short-term analysis, and thus from each frame a MFCC vector is computed. In order to extract the coefficients the speech sample is taken as the input and hamming window is applied to minimize the discontinuities of a signal. Then DFT will be used to generate the Mel filter bank. According to Mel frequency warping, the width of the triangular filters varies and so the log total energy in a critical band around the center frequency is included. After warping the numbers of coefficients are obtained. Finally the Inverse Discrete Fourier Transformer is used for the cepstral coefficients calculation [8] [9]. It transforms the log of the quefrench domain coefficients to the frequency domain where N is the length of the DFT. MFCC can be computed by Mel (f) = 2595*log10 (1+f/700). The following figure shows the steps involved in MFCC feature extraction. Figure 4: Steps involved in MFCC feature Extraction 3.3 HIDDEN MARKOV MODEL (HMM) Fig 3: Training and Testing of HMM 3.1 VOICE ACTIVITY DETECTION Voice Activity Detection (VAD) determines which parts of a voice signal are actual data and which are silence. The VAD algorithm used here utilizes the short-time energy, and zero crossing rates to decide if there is voice activity. Mel-Frequency Cepstral Coefficients (MFCC) was used to extract characteristic information from the speech vectors. 3.2 MEL-FREQUENCY CEPSTRAL COEFFICIENTS The MFCC is the most evident example of a feature set that is extensively used in voice recognition. As the frequency bands are positioned logarithmically www.ijera.com The HMM is a stochastic approach which models the given problem as a “doubly stochastic process” in which the observed data are thought to be the result of having passed the “true” (hidden) process through a second process. Both processes are to be characterized using only the one that could be observed. The problem with this approach is that one do not know anything about the Markov chains that generate the speech. The number of states in the model is unknown, there probabilistic functions are unknown and one cannot tell from which state an observation was produced. These properties are hidden, and thereby the name hidden Markov model. In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. 717 | P a g e
  • 4. Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719 IV. www.ijera.com IMPLEMENTATION & RESULTS Fig 5: Recoded Voice Fig 7: HMM coefficients after training (a) (a) (b) Fig 6: (a) Input Voice samples, (b) MFCC samples (b) Fig 8: (a) and (b) Final Output with Voice no www.ijera.com 718 | P a g e
  • 5. Mr. Anand Mantri et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.715-719 Table1. Recognition rate with different input voice samples S. No. No. of training voice Recognition rate 1 8 0.875 2 16 0.937 3 24 0.958 V. CONCLUSION Voice Recognition technology is relate to taking the human voice and converting it into words, commands, or a variety of interactive applications. In addition, voice recognition takes this application one step further by using it to verify, identity, and understand basic commands. These technologies will play a greater role in the future and even threaten to make the keyboard obsolete. Initially looking at the experiment, the plan was to have a text-dependent system, or a speaker verification system, or something that could actually determine what word was being spoken to the system by the user. It has become painfully clear that that would be a very difficult task to accomplish, and would require much more time, effort. Our system is, relatively successful –it identified speakers at a rate of almost 80-95% - a very good recognition rate for a basic system. REFERENCES [1] [2] [3] [4] [5] M. Yuan, T. Lee, P. C. Ching, and Y. Zhu, “Speech recognition on DSP: Issues on computational efficiency and performance analysis,” in Proc. IEEE ICCCAS, 2005. B. Burchard, R. Roemer, and O. Fox, “A single chip phoneme based. HMM speech recognition system for consumer applications,” IEEE Trans. Consumer Electron., vol. 46, no. 3, Aug. 2000. U. C. Pazhayaveetil,“Hardware implementation of a low power speech recognition system,” Ph.D. dissertation, Dept. Elect. Eng., North Car-olina State Univ., Raleigh, NC, 2007. Chadawan I., Siwat S. and Thaweesak Y.,” Speech Recognition using MFCC”.International Conference on Computer Graphics, Simulation and Modeling (ICGSM'2012)July 28-29, 2012 Pattaya (Thailand) W. Han, K. Hon, Ch. Chan, T. Lee, Ch. Choy, K. Pun, and P. C. Ching, “An HMMbased speech recognition IC,” in Proc. IEEE ISCAS, 2003, www.ijera.com www.ijera.com P. Li and H. Tang, “Design a co-processor for output probability Calculation in speech recognition,” in Proc. IEEE ISCAS, 2009, [7] Jia-Ching Wang, Jhing-Fa Wang*, YuSheng Weng “Chipdesign of MFCC extraction for speech recognition” INTEGRATION, the VLSI journal 32 (2002) [8] Corneliu Octavian DUMITRU, Inge GAVAT, “A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language”, 48th International Symposium ELMAR-2006, 07-09 June 2006, Zadar, Croatia [9] DOUGLAS O’SHAUGHNESSY, “Interacting With Computers by Voice: Automatic Speech Recognition and Synthesis”, Proceedings of the IEEE, VOL. 91, NO. 9, September 2003, [10] Mahdi Shaneh, and Azizollah Taheri “Voice Command Recognition System Based on MFCC and VQ Algorithms“World Academy of Science, Engineering and Technology [11] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi”Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques” JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617 [12] Kashyap Patel, R.K.Prasad .”Speech Recognition and Verification Using MFCC &VQ” International Journal of Emerging Science and Engineering(IJESE) ISSN: 2319–6378,Volume-1, Issue-7, May 2013 [6] 719 | P a g e