SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011




        A Novel Method for Speaker Independent
      Recognition Based on Hidden Markov Model
                                                        Feng-Long Huang
                           Computer Science and Information Engineering, National United University
                                         No. 1, Lienda, Miaoli, Taiwan, 36003
                                                     flhuang@nuu.edu.tw



Abstract: In this paper, we address the speaker independent           for this success is the powerful ability to characterize the
recognition of Chinese number speeches 0~9 based on HMM.              speech signal in a mathematically tractable way.
Our former results of inside and outside testing achieved                   In a typical ASR system based on HMM, the HMM
92.5% and 76.79% respectively. To improve further the                 stage is proceeded by the parameter extraction. Thus the
performance, two important features of speech; MFCC and               input to the HMM is a discrete time sequence of parameter
cluster number of vector quantification, are unified together         vectors, which will be supplied to the HMM.
and evaluated on various values. The best performance                       In the paper, the following sections are organized as
achieve 96.2% and 83.1% on MFCC Number = 20 and VQ                    follow: the process of speeches is introduced in Section 2
clustering number = 64.                                               and the acoustic model of recognition will be described in
Keywords: Speech Recognition, Hidden Markov Model,                    Section 3. The initial results for former approaches are
LBG Algorithm, Mel-frequency cepstral coefficients, Viterbi           presented in Section 4. The improvement metods are
Algorithm.                                                            furthermore described in Section 5
                     I. INTRODUCTION                                                    II. PROCESSES OF SPEECH
    In Speech processing, automatic speech recognition                  In this section, we will describe all the procedures for
(ASR) is capable automatically of understanding the input             pre-processes.
of human speech for the text output with various                      A. Processing Speech
vocabularies. ASR can be applied in a wide range of                        The analog voice signals are recorded thru
applications, such as: human interface design, speech                 microphone. It should be digitalized and quantified. The
Information Retrieval (SIR) [11,12], language translation,            digital signal process can be described as follows:
and so on. In real world, there are several commercial                x   p   (t ) = x a (t ) p (t )
ASR systems, for example, IBM’s Via Voice, Mandarin                       (1)
Dictation System–the Golden Mandarin (III) of NTU in                  where xp(t) and xa(t) denote the processed and analog
Taiwan, Voice Portal on Internet and 104 on-line speech               signal. p(t) is the impulse signal.
queries systems. Modern ASR technologies merged the                        Each signal should be segmented into several short
signal process, pattern recognition, network and                      frames of speech which contain a time series signal. The
telecommunication into a unified framework. Such                      features of each frame are extracted for further processes.
architecture can be expanded into broad domains of
                                                                     B. Pre-emphasis
services, such as e-commerce and wireless speech system
                                                                         Basically, the purpose of pre-emphasis is to increase,
of WiMAX.                                                            the magnitude of some (usually higher) frequencies with
   The approaches adopted on ASR can be categorized as:              respect to the magnitude of other (usually lower)
1)Hidden Markov Model (HMM) [1,2,3,4], 2)Neural                      frequencies in order to improve the overall signal-to-noise
Networks [5,6,7], 3)Wavelet-based and spectrum coefficients          ratio (SNR) by minimizing the adverse effects of such
of speech [15,16], other method is the combination of first
                                                                     phenomena as attenuation distortion.
two approaches above [8,9]. The Hidden Markov Model is               C. Frame Blocking
a result of the attempt to model the speech generation                      While analyzing audio signals, we usually adopt the
statistically, and thus belongs to the first category above.          method of short-term analysis because most audio signals
During the past several years it has become the most                  are relatively stable within a short period of time. Usually,
successful speech model used in ASR. The main reason
                                                                27
© 2011 ACEEE
DOI: 01.IJSIP.02.01.218
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



the signal will be segmented into time frame, say 15 ~ 30              In a regular Markov model, the state is directly visible
ms.                                                                to the observer, and therefore the state transition
D. Hamming Window                                                  probabilities are the only parameters. However, in a
      In signal processing, the window function is                 hidden Markov model, the state is not directly visible (so-
a function that is zero-valued outside of some                     called hidden), while the variables influenced by the state
chosen interval. The Hamming window is a weighted                  are visible. Each state has a probability distribution over
moving average transformation used to smooth the
                                                                   the output. Therefore, the sequence of tokens generated by
periodogram values.
    Supposed that original signal s(n) is as follows:              an HMM gives some information about the sequence of
 s(n), n = 0,…N-1                                     (2)          states.
   The original signal s(n) is multiplied by hamming                  A complete HMM can be defined as follows:
window w(n), we will obtain s(n)* w(n), w(n) can be                 λ = ( π , A, B)                                         (5)
defined as follows:                                                      HMM model can be defined as ( π , A, B) :
                                                                    1.   Π (Initial state probability):
w(n) = (1 - α) – α*cos(2πn/(N-1)), 0≦ n≦ N-1            (3)         π = { π i = prob(q             = S i )}       1≤ i ≤ N            (6)
                                                                                               1
where N denotes the sample number in a window.                      2. A (State transition probability):
E. Mel-frequency cepstral coefficients                               A = {a ij = prob(q        t+1 = S        j   |q   t   = S i )}   (7)
    Mel Frequency Cepstral Coefficient (MFCC) is one of                 1 ≤ i ≤ N
the most effective feature parameter in speech recognition.          3. B (Observation symbol probability):
                                                                      B = {b j (O t ) = prob(Ot | q t = S j )} 1 ≤ i ≤ N              (8)
For speech representation, it is well known that MFCC
parameters appear to be more effective than power                  where O = {O 1 , O 2 ,.... , O T } is the observation.
spectrum based features. MFCCs are based on the human                    S = {S1 , S 2 , S 3 ,..... , S N } is state symbols and
ears' non-linear frequency characteristic and perform a                  q = {q 1 , q 2 , q 3 ,..... , q T } is observation states and
high recognition rate in practical application.                    T denote the length of observation, N is the number of
   o lower frequency, human hear more acute.                       states.
   o higher frequency, human hear less acute.                      C. System Models
 As shown in Fig. 7, MFCC are presented as:                              The recognition system is composed of two main
mel(f)=1125*ln(1+f/700)                                (4)         functions: 1) extracting the speech features, including
                                                                   frame blocking, VQ, and so on, 2) constructing the model
         III. ACOUSTIC MODEL OF RECOGNITION                        and recognition based on the HMM, VQ and Viterbi
                                                                   Algorithm.
A. Vector Quantification                                               It is apparent that short speech signal varied sharply
      Foundational vector quantifications (VQ) were                and rapidly, whereas longer signal varied slowly.
proposed by Y. Linde, A. Buzo, and R. Gray in 1980, So-            Therefore, we use the dynamic frame blocking rather than
called LBG algorithm. LBG is based on k-means                      fixed frame for different experiments.
clustering [2,5], referring to the size of codebook G,
training vectors will be categorized into G groups. The                               IV. INITIAL EXPERIMENTS
centroid Ci of each Gi will be the representative for such
                                                                   A. Recognition System Based on HMM
vector of codeword. In principal, the category is tree
                                                                         In the paper, we focus on speaker independent
based structure.
                                                                   speech recognition of Chinese number speeches 0~9. All
B. Hidden Markov Model                                             the samples with 44100 Hz/16 bits are recorded by three
                                                                   native male adults. Total 560 samples are divided into two
   A Hidden Markov Model (HMM) is a statistical model
                                                                   parts, 280 for training and 280 for testing. After complete
in which is assumed to be a Markov process with
                                                                   the pre-process, such as preemphasis, frame boloking, VQ.
unknown parameters. The challenge is to find all the
appropriate hidden parameters from the observable states.          B. Comparison for fixed and Dynamic Frame Size
HMM can be considered as the simplest dynamic
                                                                       According to our empirical results, comparing the
Bayesian network.
                                                                   fixed and dynamic frame size, recognition rate of fixed

                                                              28
© 2011 ACEEE
DOI: 01.IJSIP.02.01.218
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



frame size achieves 76.79%, and superior to the other                       B. Better Combination of Various Features
with75.71%, as shown in Table 1.                                                To improve furthermore the performance, two spectrum
     Table 1: comparing   the frame size, (SymbolNum=64)                    features, MFCC and cluster number, of speeches are
               wave    Mfcc     VQ      HMM       Symbol     rate(%)        unified and evaluated. MFCC degree varied from 8 to 36
               Num      time    time   training    Num                      with interval 4 and cluster number varied on 32 to 256
           I    280                                          90.36          with interval 32. We evaluated all the combination for
  fixed                32.9    5.77     3.44       64                       these two features with various numbers. The process
           O    280                                          76.79*
                                                                            times needed for computation are shown in Table 2. The
           I    280                                          92.50*         best results can achieve on MFCC Number= 20 and VQ
 dynamic               32.0    3.31     2.42       64
           O    280                                          75.71          clustering number = 64. The inside and outside testing of
PS. I and O denote the inside and outside testing, respectively             recognition achieve 96.2% and 83.1% shown in Fig. 3 and
                                                                            net results for inside and outside testing are 3.7% and
                 V. FURTHER IMPROVEMENT
                                                                            6.3% respectively. We just list the results with VQ = 64 in
A. Improving the Samples of Speech                                          the paper.
     According to our empirical results, recognition rate                                      Table 2: processed time with VQ = 64.
achieve better results while cluster number=64. Inside and
                                                                            MFCC
outside testing are 92.5% and 76.79%, respectively.                         degree     8     12     16       20      24       28       32     36
     To improve the performance, we analyze all the
                                                                            MFCC     15.8   16.9    18.6    23.5     25.3    27.2      28.5   29.9
speech wavelet. There are many samples affected by boost
noise derived from human speaking or environment, as                         VQ      1.0     2.6     3.3     3.4     3.8      4.9      5.3    6.6
shown in Fig. 1. In such a situation, the end points of
                                                                            HMM      1.7     1.7     1.8     1.8     1.8      1.8      1.9    1.9
boosted speech cannot be usually detected correctly. It
will lead to degrade the performance of system.
     Usually, detecting end points judged on ZCR and
energy of speech, as shown in Fig. 1. However, it is
significant that we need extra features to detect for noise
situation. Based on experimental results and observation,
the improvement rules are summarized as follows:
    Input: X(n) , n = 1 to j
    Output: Y(m),1 <= m <= j
    1. segment the speech X(n): framedY = framed (X(n))
    2. calculate the ZCR and energy for each frame.
    3. smooth the curves for both ZCR and energy
    4. calculate the average of first 10 frames, and
                                                                                     Fig. 1: before improvement, Chinese number 8 (ㄅㄚ)
        multiplying 1.2. The average value will be used as
        the threshold for detecting process.
    5. ZCR is valid only if framedY is larger than 100, as
        shown in Fig. 2.
    6. the speech will be effective only if the size is larger
        than 3ms.
    7. the starting energy of speech should be larger than
        threshold.
     8. the energy for continuous 5 frames of speech                                                         .
        should be increased progressively.
    Referring to the improvement, the speeches number 8
(ㄅㄚ) with boost noise can be detected, as shown in Fig.
2. The improvement of detection will leads to better                                 Fig. 2: after improvement, Chinese number 8 (ㄅㄚ).
results for following recognition process.
                                                                       29
© 2011 ACEEE
DOI: 01.IJSIP.02.01.218
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



        100                                                                     Observations--A Combinatorial Method, IEEE
            95                                                                  Transactions on Pattern Analysis and Machine
  performance(%)


            90
                                                                                Intelligence (PAMI), Vol. 22, No. 4.
            85
                                                                            [4] A. Sperduti and A. Starita, May 1997, Supervised
            80

            75
                                                                                Neural Networks for Classification of Structures. IEEE
            70                    Inside Test(%)                                Transactions on Neural Networks, 8(3): pp.714-735.
            65                    Outside test(%)                           [6] E. Behrman, L. Nash, J. Steck, V. Chandrashekar, and
            60     8   12    16   20      24        28   32   36
                              MFC C de gre e
                                                                                S. Skinner, October 2000, Simulations of Quantum
                                                                                Neural Networks, Information Sciences, 128(3-4): pp.
                                                                                257-269.
 Fig. 3: performance with VQ = 64, MFCC degrees varied between 8 and
                                                                            [7] Hsien-Leing Tsai, 2004, Automatic Construction
                                       36.
                                                                                Algorithms for Supervised Neural Networks and
                             VI. CONCLUSION                                     Applications, PhD thesis of NSYSU, Taiwan.
                                                                            [8] Li-Yi Lu, 2003, The Research of Neural Network and
         In this paper, we address the speaker independent                      Hidden Markov Model Applied on Personal Digital
  speech recognition of Chinese number speeches based on                        Assistant, Master thesis of CYU, Taiwan.
  HMM. The algorithm for our novel approach is proposed                     [10] Rabiner, L. R., 1989, A Tutorial on Hidden Markov
  for the speech recognition. 480 speech samples are                            Models and Selected Applications in Speech
  recorded and pre-processed. The preliminary results of                        Recognition, Proceedings of the IEEE, Vol.77, No.22,
  outside testing achieve 76.79%.                                               pp.257-286.
      To improve furthermore the performance, two                           [11] Manfred R. Schroeder, H. Quast, H.W. Strube,
features of speeches; MFCC and VQ cluster number, are                           Computer Speech: Recognition, Compression,
evaluated. We then find the combination of two spectrum                         Synthesis , Springer, 2004.
features to achieve best results. The best performance will                 [12] Wald, M., 2006, Learning Through Multimedia:
be achieved on MFCC, Number = 20 and VQ clustering                              Automatic Speech Recognition Enabling Accessibility
number = 64. The final inside and outside testing of                            and Interaction. Proceedings of ED-MEDIA 2006:
recognition achieve 96.2% and 83.1%. It proves that the                         World Conference on Educational Multimedia,
proposed approach can be employed to recognize the                              Hypermedia & Telecommunications. pp. 2965-2976.
speaker independent speeches.                                               [13]A. Revathi, R. Ganapathy and Y. Venkataramani, Nov.
Future works will be studied in the following:                                  2009, Text Independent Speaker Recognition and
  1) Employing other effective methods to merging novel                         Speaker Independent Speech Recognition Using
     method to enhance the performance.                                         Iterative Clustering Approach, International Journal of
  2) Applying the method into isolated Chinese speech                           Computer science & Information Technology (IJCSIT),
     recognition.                                                               Vol. 1, No 2, pp.30-42.
          3) Improving the precision rates.                                 [14]Haamid M. Gazi, Omar Farooq, Yusuf U. Khan,
                                                                                Sekharjit Datta,      2008, Wavelet-based, speaker-
                            ACKNOWLEDGEMENT
                                                                                independent isolated Hindi digit recognition
    The paper is supported under the Project of Lein-Ho                         International     Journal    of    Information      and
 Foundation, Taiwan.                                                            Communication Technology, Vol. 1 , Issue 2 pp.
                                                                                185-198
                              REFERENCES
                                                                            [15]Chakraborty P., et at., 2008, An Automatic Speaker
 [1] Keng-Yu Lin, 2006, Extended Discrete Hidden                                Recognition System, Neural Information Processing,
     Markov Model and Its Application to Chinese Syllable                       Lecture Notes in Computer Science (LNCS), Springer
     Recognition, Master thesis of NCHU, Taiwan.                                Berlin / Heidelberg, pp. 517-526.
 [2] Keng-Yu Lin, 2006, Extended Discrete Hidden                            [16] Kun-Ching Wang, 2009, Wavelet-Based Speech
     Markov Model and Its Application to Chinese Syllable                      Enhancement     Using     Time-Frequency    Adaptation,
     Recognition, Master thesis of NCHU.                                       EURASIP Journal on Advances in Signal Processing,
 [3] X. Li, M. Parizeau and R. Plamondon, April 2000,                          Volume 2009 (2009), Article ID 924135.
     Training Hidden Markov Models with Multiple
                                                                       30
 © 2011 ACEEE
 DOI: 01.IJSIP.02.01.218

Mais conteúdo relacionado

Mais procurados

Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
 
20080502 software verification_sharygina_lecture03
20080502 software verification_sharygina_lecture0320080502 software verification_sharygina_lecture03
20080502 software verification_sharygina_lecture03Computer Science Club
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systemsNamratha Dcruz
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker VerificationCody Ray
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
 
Ber performance analysis of mimo systems using equalization
Ber performance analysis of mimo systems using equalizationBer performance analysis of mimo systems using equalization
Ber performance analysis of mimo systems using equalizationAlexander Decker
 
Performance analysis of image compression using fuzzy logic algorithm
Performance analysis of image compression using fuzzy logic algorithmPerformance analysis of image compression using fuzzy logic algorithm
Performance analysis of image compression using fuzzy logic algorithmsipij
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...Rosdiadee Nordin
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition systemDeepesh Lekhak
 

Mais procurados (17)

A017410108
A017410108A017410108
A017410108
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Fb24958960
Fb24958960Fb24958960
Fb24958960
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
20080502 software verification_sharygina_lecture03
20080502 software verification_sharygina_lecture0320080502 software verification_sharygina_lecture03
20080502 software verification_sharygina_lecture03
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
 
Ber performance analysis of mimo systems using equalization
Ber performance analysis of mimo systems using equalizationBer performance analysis of mimo systems using equalization
Ber performance analysis of mimo systems using equalization
 
Performance analysis of image compression using fuzzy logic algorithm
Performance analysis of image compression using fuzzy logic algorithmPerformance analysis of image compression using fuzzy logic algorithm
Performance analysis of image compression using fuzzy logic algorithm
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
 

Destaque

A Dynamic MAC Protocol for WCDMA Wireless Multimedia Networks
A Dynamic MAC Protocol for WCDMA Wireless Multimedia NetworksA Dynamic MAC Protocol for WCDMA Wireless Multimedia Networks
A Dynamic MAC Protocol for WCDMA Wireless Multimedia NetworksIDES Editor
 
A Robust & Fast Face Detection System
A Robust & Fast Face Detection SystemA Robust & Fast Face Detection System
A Robust & Fast Face Detection SystemIDES Editor
 
A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...
A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...
A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...IDES Editor
 
Towards a Software Framework for Automatic Business Process Redesign
Towards a Software Framework for Automatic Business Process RedesignTowards a Software Framework for Automatic Business Process Redesign
Towards a Software Framework for Automatic Business Process RedesignIDES Editor
 
Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...
Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...
Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...IDES Editor
 
Detection of Carotid Artery from Pre-Processed Magnetic Resonance Angiogram
Detection of Carotid Artery from Pre-Processed Magnetic Resonance AngiogramDetection of Carotid Artery from Pre-Processed Magnetic Resonance Angiogram
Detection of Carotid Artery from Pre-Processed Magnetic Resonance AngiogramIDES Editor
 
Using PageRank Algorithm to Improve Coupling Metrics
Using PageRank Algorithm to Improve Coupling MetricsUsing PageRank Algorithm to Improve Coupling Metrics
Using PageRank Algorithm to Improve Coupling MetricsIDES Editor
 
Modified Epc Global Network Architecture of Internet of Things for High Load ...
Modified Epc Global Network Architecture of Internet of Things for High Load ...Modified Epc Global Network Architecture of Internet of Things for High Load ...
Modified Epc Global Network Architecture of Internet of Things for High Load ...IDES Editor
 
Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 

Destaque (9)

A Dynamic MAC Protocol for WCDMA Wireless Multimedia Networks
A Dynamic MAC Protocol for WCDMA Wireless Multimedia NetworksA Dynamic MAC Protocol for WCDMA Wireless Multimedia Networks
A Dynamic MAC Protocol for WCDMA Wireless Multimedia Networks
 
A Robust & Fast Face Detection System
A Robust & Fast Face Detection SystemA Robust & Fast Face Detection System
A Robust & Fast Face Detection System
 
A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...
A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...
A Quality of Service Strategy to Optimize Bandwidth Utilization in Mobile Net...
 
Towards a Software Framework for Automatic Business Process Redesign
Towards a Software Framework for Automatic Business Process RedesignTowards a Software Framework for Automatic Business Process Redesign
Towards a Software Framework for Automatic Business Process Redesign
 
Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...
Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...
Different Attacks on Selective Encryption in RSA based Singular Cubic Curve w...
 
Detection of Carotid Artery from Pre-Processed Magnetic Resonance Angiogram
Detection of Carotid Artery from Pre-Processed Magnetic Resonance AngiogramDetection of Carotid Artery from Pre-Processed Magnetic Resonance Angiogram
Detection of Carotid Artery from Pre-Processed Magnetic Resonance Angiogram
 
Using PageRank Algorithm to Improve Coupling Metrics
Using PageRank Algorithm to Improve Coupling MetricsUsing PageRank Algorithm to Improve Coupling Metrics
Using PageRank Algorithm to Improve Coupling Metrics
 
Modified Epc Global Network Architecture of Internet of Things for High Load ...
Modified Epc Global Network Architecture of Internet of Things for High Load ...Modified Epc Global Network Architecture of Internet of Things for High Load ...
Modified Epc Global Network Architecture of Internet of Things for High Load ...
 
Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 

Semelhante a A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model

An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of VocabularyAn Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabularysipij
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
Iberspeech2012
Iberspeech2012Iberspeech2012
Iberspeech2012joseangl
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - FinalMax Robertson
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...eSAT Journals
 
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...CSCJournals
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...IDES Editor
 

Semelhante a A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model (20)

An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of VocabularyAn Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
P141omfccu
P141omfccuP141omfccu
P141omfccu
 
Iberspeech2012
Iberspeech2012Iberspeech2012
Iberspeech2012
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
 
Iy2617051711
Iy2617051711Iy2617051711
Iy2617051711
 
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform DomainReal Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...
 
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
 
Bz25454457
Bz25454457Bz25454457
Bz25454457
 
Asr
AsrAsr
Asr
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...
 

Mais de IDES Editor

Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 
Mental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive ModelMental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive ModelIDES Editor
 

Mais de IDES Editor (20)

Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 
Mental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive ModelMental Stress Evaluation using an Adaptive Model
Mental Stress Evaluation using an Adaptive Model
 

Último

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model

  • 1. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model Feng-Long Huang Computer Science and Information Engineering, National United University No. 1, Lienda, Miaoli, Taiwan, 36003 flhuang@nuu.edu.tw Abstract: In this paper, we address the speaker independent for this success is the powerful ability to characterize the recognition of Chinese number speeches 0~9 based on HMM. speech signal in a mathematically tractable way. Our former results of inside and outside testing achieved In a typical ASR system based on HMM, the HMM 92.5% and 76.79% respectively. To improve further the stage is proceeded by the parameter extraction. Thus the performance, two important features of speech; MFCC and input to the HMM is a discrete time sequence of parameter cluster number of vector quantification, are unified together vectors, which will be supplied to the HMM. and evaluated on various values. The best performance In the paper, the following sections are organized as achieve 96.2% and 83.1% on MFCC Number = 20 and VQ follow: the process of speeches is introduced in Section 2 clustering number = 64. and the acoustic model of recognition will be described in Keywords: Speech Recognition, Hidden Markov Model, Section 3. The initial results for former approaches are LBG Algorithm, Mel-frequency cepstral coefficients, Viterbi presented in Section 4. The improvement metods are Algorithm. furthermore described in Section 5 I. INTRODUCTION II. PROCESSES OF SPEECH In Speech processing, automatic speech recognition In this section, we will describe all the procedures for (ASR) is capable automatically of understanding the input pre-processes. of human speech for the text output with various A. Processing Speech vocabularies. ASR can be applied in a wide range of The analog voice signals are recorded thru applications, such as: human interface design, speech microphone. It should be digitalized and quantified. The Information Retrieval (SIR) [11,12], language translation, digital signal process can be described as follows: and so on. In real world, there are several commercial x p (t ) = x a (t ) p (t ) ASR systems, for example, IBM’s Via Voice, Mandarin (1) Dictation System–the Golden Mandarin (III) of NTU in where xp(t) and xa(t) denote the processed and analog Taiwan, Voice Portal on Internet and 104 on-line speech signal. p(t) is the impulse signal. queries systems. Modern ASR technologies merged the Each signal should be segmented into several short signal process, pattern recognition, network and frames of speech which contain a time series signal. The telecommunication into a unified framework. Such features of each frame are extracted for further processes. architecture can be expanded into broad domains of B. Pre-emphasis services, such as e-commerce and wireless speech system Basically, the purpose of pre-emphasis is to increase, of WiMAX. the magnitude of some (usually higher) frequencies with The approaches adopted on ASR can be categorized as: respect to the magnitude of other (usually lower) 1)Hidden Markov Model (HMM) [1,2,3,4], 2)Neural frequencies in order to improve the overall signal-to-noise Networks [5,6,7], 3)Wavelet-based and spectrum coefficients ratio (SNR) by minimizing the adverse effects of such of speech [15,16], other method is the combination of first phenomena as attenuation distortion. two approaches above [8,9]. The Hidden Markov Model is C. Frame Blocking a result of the attempt to model the speech generation While analyzing audio signals, we usually adopt the statistically, and thus belongs to the first category above. method of short-term analysis because most audio signals During the past several years it has become the most are relatively stable within a short period of time. Usually, successful speech model used in ASR. The main reason 27 © 2011 ACEEE DOI: 01.IJSIP.02.01.218
  • 2. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 the signal will be segmented into time frame, say 15 ~ 30 In a regular Markov model, the state is directly visible ms. to the observer, and therefore the state transition D. Hamming Window probabilities are the only parameters. However, in a In signal processing, the window function is hidden Markov model, the state is not directly visible (so- a function that is zero-valued outside of some called hidden), while the variables influenced by the state chosen interval. The Hamming window is a weighted are visible. Each state has a probability distribution over moving average transformation used to smooth the the output. Therefore, the sequence of tokens generated by periodogram values. Supposed that original signal s(n) is as follows: an HMM gives some information about the sequence of s(n), n = 0,…N-1 (2) states. The original signal s(n) is multiplied by hamming A complete HMM can be defined as follows: window w(n), we will obtain s(n)* w(n), w(n) can be λ = ( π , A, B) (5) defined as follows: HMM model can be defined as ( π , A, B) : 1. Π (Initial state probability): w(n) = (1 - α) – α*cos(2πn/(N-1)), 0≦ n≦ N-1 (3) π = { π i = prob(q = S i )} 1≤ i ≤ N (6) 1 where N denotes the sample number in a window. 2. A (State transition probability): E. Mel-frequency cepstral coefficients A = {a ij = prob(q t+1 = S j |q t = S i )} (7) Mel Frequency Cepstral Coefficient (MFCC) is one of 1 ≤ i ≤ N the most effective feature parameter in speech recognition. 3. B (Observation symbol probability): B = {b j (O t ) = prob(Ot | q t = S j )} 1 ≤ i ≤ N (8) For speech representation, it is well known that MFCC parameters appear to be more effective than power where O = {O 1 , O 2 ,.... , O T } is the observation. spectrum based features. MFCCs are based on the human S = {S1 , S 2 , S 3 ,..... , S N } is state symbols and ears' non-linear frequency characteristic and perform a q = {q 1 , q 2 , q 3 ,..... , q T } is observation states and high recognition rate in practical application. T denote the length of observation, N is the number of o lower frequency, human hear more acute. states. o higher frequency, human hear less acute. C. System Models As shown in Fig. 7, MFCC are presented as: The recognition system is composed of two main mel(f)=1125*ln(1+f/700) (4) functions: 1) extracting the speech features, including frame blocking, VQ, and so on, 2) constructing the model III. ACOUSTIC MODEL OF RECOGNITION and recognition based on the HMM, VQ and Viterbi Algorithm. A. Vector Quantification It is apparent that short speech signal varied sharply Foundational vector quantifications (VQ) were and rapidly, whereas longer signal varied slowly. proposed by Y. Linde, A. Buzo, and R. Gray in 1980, So- Therefore, we use the dynamic frame blocking rather than called LBG algorithm. LBG is based on k-means fixed frame for different experiments. clustering [2,5], referring to the size of codebook G, training vectors will be categorized into G groups. The IV. INITIAL EXPERIMENTS centroid Ci of each Gi will be the representative for such A. Recognition System Based on HMM vector of codeword. In principal, the category is tree In the paper, we focus on speaker independent based structure. speech recognition of Chinese number speeches 0~9. All B. Hidden Markov Model the samples with 44100 Hz/16 bits are recorded by three native male adults. Total 560 samples are divided into two A Hidden Markov Model (HMM) is a statistical model parts, 280 for training and 280 for testing. After complete in which is assumed to be a Markov process with the pre-process, such as preemphasis, frame boloking, VQ. unknown parameters. The challenge is to find all the appropriate hidden parameters from the observable states. B. Comparison for fixed and Dynamic Frame Size HMM can be considered as the simplest dynamic According to our empirical results, comparing the Bayesian network. fixed and dynamic frame size, recognition rate of fixed 28 © 2011 ACEEE DOI: 01.IJSIP.02.01.218
  • 3. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 frame size achieves 76.79%, and superior to the other B. Better Combination of Various Features with75.71%, as shown in Table 1. To improve furthermore the performance, two spectrum Table 1: comparing the frame size, (SymbolNum=64) features, MFCC and cluster number, of speeches are wave Mfcc VQ HMM Symbol rate(%) unified and evaluated. MFCC degree varied from 8 to 36 Num time time training Num with interval 4 and cluster number varied on 32 to 256 I 280 90.36 with interval 32. We evaluated all the combination for fixed 32.9 5.77 3.44 64 these two features with various numbers. The process O 280 76.79* times needed for computation are shown in Table 2. The I 280 92.50* best results can achieve on MFCC Number= 20 and VQ dynamic 32.0 3.31 2.42 64 O 280 75.71 clustering number = 64. The inside and outside testing of PS. I and O denote the inside and outside testing, respectively recognition achieve 96.2% and 83.1% shown in Fig. 3 and net results for inside and outside testing are 3.7% and V. FURTHER IMPROVEMENT 6.3% respectively. We just list the results with VQ = 64 in A. Improving the Samples of Speech the paper. According to our empirical results, recognition rate Table 2: processed time with VQ = 64. achieve better results while cluster number=64. Inside and MFCC outside testing are 92.5% and 76.79%, respectively. degree 8 12 16 20 24 28 32 36 To improve the performance, we analyze all the MFCC 15.8 16.9 18.6 23.5 25.3 27.2 28.5 29.9 speech wavelet. There are many samples affected by boost noise derived from human speaking or environment, as VQ 1.0 2.6 3.3 3.4 3.8 4.9 5.3 6.6 shown in Fig. 1. In such a situation, the end points of HMM 1.7 1.7 1.8 1.8 1.8 1.8 1.9 1.9 boosted speech cannot be usually detected correctly. It will lead to degrade the performance of system. Usually, detecting end points judged on ZCR and energy of speech, as shown in Fig. 1. However, it is significant that we need extra features to detect for noise situation. Based on experimental results and observation, the improvement rules are summarized as follows: Input: X(n) , n = 1 to j Output: Y(m),1 <= m <= j 1. segment the speech X(n): framedY = framed (X(n)) 2. calculate the ZCR and energy for each frame. 3. smooth the curves for both ZCR and energy 4. calculate the average of first 10 frames, and Fig. 1: before improvement, Chinese number 8 (ㄅㄚ) multiplying 1.2. The average value will be used as the threshold for detecting process. 5. ZCR is valid only if framedY is larger than 100, as shown in Fig. 2. 6. the speech will be effective only if the size is larger than 3ms. 7. the starting energy of speech should be larger than threshold. 8. the energy for continuous 5 frames of speech . should be increased progressively. Referring to the improvement, the speeches number 8 (ㄅㄚ) with boost noise can be detected, as shown in Fig. 2. The improvement of detection will leads to better Fig. 2: after improvement, Chinese number 8 (ㄅㄚ). results for following recognition process. 29 © 2011 ACEEE DOI: 01.IJSIP.02.01.218
  • 4. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 100 Observations--A Combinatorial Method, IEEE 95 Transactions on Pattern Analysis and Machine performance(%) 90 Intelligence (PAMI), Vol. 22, No. 4. 85 [4] A. Sperduti and A. Starita, May 1997, Supervised 80 75 Neural Networks for Classification of Structures. IEEE 70 Inside Test(%) Transactions on Neural Networks, 8(3): pp.714-735. 65 Outside test(%) [6] E. Behrman, L. Nash, J. Steck, V. Chandrashekar, and 60 8 12 16 20 24 28 32 36 MFC C de gre e S. Skinner, October 2000, Simulations of Quantum Neural Networks, Information Sciences, 128(3-4): pp. 257-269. Fig. 3: performance with VQ = 64, MFCC degrees varied between 8 and [7] Hsien-Leing Tsai, 2004, Automatic Construction 36. Algorithms for Supervised Neural Networks and VI. CONCLUSION Applications, PhD thesis of NSYSU, Taiwan. [8] Li-Yi Lu, 2003, The Research of Neural Network and In this paper, we address the speaker independent Hidden Markov Model Applied on Personal Digital speech recognition of Chinese number speeches based on Assistant, Master thesis of CYU, Taiwan. HMM. The algorithm for our novel approach is proposed [10] Rabiner, L. R., 1989, A Tutorial on Hidden Markov for the speech recognition. 480 speech samples are Models and Selected Applications in Speech recorded and pre-processed. The preliminary results of Recognition, Proceedings of the IEEE, Vol.77, No.22, outside testing achieve 76.79%. pp.257-286. To improve furthermore the performance, two [11] Manfred R. Schroeder, H. Quast, H.W. Strube, features of speeches; MFCC and VQ cluster number, are Computer Speech: Recognition, Compression, evaluated. We then find the combination of two spectrum Synthesis , Springer, 2004. features to achieve best results. The best performance will [12] Wald, M., 2006, Learning Through Multimedia: be achieved on MFCC, Number = 20 and VQ clustering Automatic Speech Recognition Enabling Accessibility number = 64. The final inside and outside testing of and Interaction. Proceedings of ED-MEDIA 2006: recognition achieve 96.2% and 83.1%. It proves that the World Conference on Educational Multimedia, proposed approach can be employed to recognize the Hypermedia & Telecommunications. pp. 2965-2976. speaker independent speeches. [13]A. Revathi, R. Ganapathy and Y. Venkataramani, Nov. Future works will be studied in the following: 2009, Text Independent Speaker Recognition and 1) Employing other effective methods to merging novel Speaker Independent Speech Recognition Using method to enhance the performance. Iterative Clustering Approach, International Journal of 2) Applying the method into isolated Chinese speech Computer science & Information Technology (IJCSIT), recognition. Vol. 1, No 2, pp.30-42. 3) Improving the precision rates. [14]Haamid M. Gazi, Omar Farooq, Yusuf U. Khan, Sekharjit Datta, 2008, Wavelet-based, speaker- ACKNOWLEDGEMENT independent isolated Hindi digit recognition The paper is supported under the Project of Lein-Ho International Journal of Information and Foundation, Taiwan. Communication Technology, Vol. 1 , Issue 2 pp. 185-198 REFERENCES [15]Chakraborty P., et at., 2008, An Automatic Speaker [1] Keng-Yu Lin, 2006, Extended Discrete Hidden Recognition System, Neural Information Processing, Markov Model and Its Application to Chinese Syllable Lecture Notes in Computer Science (LNCS), Springer Recognition, Master thesis of NCHU, Taiwan. Berlin / Heidelberg, pp. 517-526. [2] Keng-Yu Lin, 2006, Extended Discrete Hidden [16] Kun-Ching Wang, 2009, Wavelet-Based Speech Markov Model and Its Application to Chinese Syllable Enhancement Using Time-Frequency Adaptation, Recognition, Master thesis of NCHU. EURASIP Journal on Advances in Signal Processing, [3] X. Li, M. Parizeau and R. Plamondon, April 2000, Volume 2009 (2009), Article ID 924135. Training Hidden Markov Models with Multiple 30 © 2011 ACEEE DOI: 01.IJSIP.02.01.218