SlideShare uma empresa Scribd logo
1 de 6
ISSN: 2277 – 9043
                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                   Volume 1, Issue 4, June 2012



                SPEAKER RECOGNITION IN NOISY
                       ENVIRONMENT

             Mr. Mohammed Imdad N1 , Dr . Shameem Akhtar N1, Prof.Mohammad Imran Akhtar 2

         1     Computer Science and Engineering department , KBN College of Engineering, Gulbarga, India

         2     Electronics and Communication department , AITM , Bhatkal




         Abstract--- This paper investigates the problem of                       The speech signal conveys several levels of
speaker identification and verification in noisy conditions,            information. Primarily, the speech signal conveys the words or
assuming that speech signals are corrupted by noise. This paper         message being spoken, but on a secondary level, the signal
describes a method that combines multi-condition model training         also conveys information about the identity of the speaker.
and missing-feature theory to model noise with unknown
temporal-spectral characteristics. Introduction of such technique
                                                                        The area of speaker recognition is concerned with extracting
is very useful since it remove avoids the problem of recognizing        the identity of the person speaking an utterance. As speech
voice and can also be implemented since here user is not required       interaction with the computers become more pervasive in
to remember his password login and hence no stilling chance.            activities such as telephone transactions and information
                                                                        retrieval from speech databases, the utility of automatically
Index Terms— Cepstrum, Missing Feature method, Multi-                   recognizing a speaker based on his vocal characteristics
condition model training, Vector quantization                           increases.

                  I.      INTRODUCTION                                    II.      WORKING OF A SPEAKER RECOGNITION
          Spoken language is the most natural way used by                                         SYSTEM
humans to communicate information. The speech signal
conveys several types of information. From the speech                             Like most pattern recognition problems, a speaker
production point of view, the speech signal conveys linguistic          recognition system can be partitioned into two modules:
information (e.g., message and language) and speaker                    feature extraction and classification. The classification module
information (e.g., emotional, regional, and physiological               has two components: pattern matching and decision. The
characteristics). From the speech perception point of view, it          feature extraction module estimates a set of features from the
also conveys information about the environment in which the             speech signal that represent some speaker-specific
speech was produced and transmitted. Even though this wide              information. The speaker-specific information is the result of
range of information is encoded in a complex form into the              complex transformations occurring at different levels of the
speech signal, humans can easily decode most of the                     speech production: semantic, phonologic, phonetic, and
information. This speech technology has found wide                      acoustic.
applications such as automatic dictation, voice command
control, audio archive indexing and retrieval etc.

          Speaker recognition refers to two fields: Speaker
Identification (SI) and Speaker Verification (SV). In speaker
identification, the goal is to determine which one of group of
known voices best matches the input voice sample. There are
two tasks: text-dependent and text-independent speaker
identification. In text dependent identification, the spoken
phrase is known to the system whereas in the text independent
case, the spoken phrase is unknown. Success in both
identification tasks depends on extracting and modeling the                       Figure 1 : Generic speaker recognition system
speaker dependent characteristics of the speech signal, which
can effectively distinguish between talkers                                     The pattern matching module is responsible for
                                                                        comparing the estimated features to the speaker models. There
                                                                                                                                     52

                                                   All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                   Volume 1, Issue 4, June 2012


are many types of pattern matching methods and                          called quasi-stationary). An example of speech signal is shown
corresponding models used in speaker recognition [13]. Some             in Figure 2. When examined over a sufficiently short period of
of the methods include hidden Markov models (HMM),                      time (between 5 and 100 msec), its characteristics are fairly
dynamic time warping (DTW), and vector quantization (VQ).               stationary. However, over long periods of time (on the order
                                                                        of 1/5 seconds or more) the signal characteristic change to
                                                                        reflect the different speech sounds being spoken. Therefore,
   III.     SPEAKER RECOGNITION PRINCIPLES                              short-time spectral analysis is the most common way to
          Depending on the application, the general area of             characterize the speech signal.
speaker recognition can be divided into three specific tasks:
                                                                                  A wide range of possibilities exist for parametrically
identification, detection/verification, and segmentation and
                                                                        representing the speech signal for the speaker recognition task,
clustering. The goal of the speaker identification task is to
                                                                        such as Linear Prediction Coding (LPC), Mel-Frequency
determine which speaker out of a group of known speakers
                                                                        Cepstrum Coefficients (MFCC), and others. MFCC is perhaps
produces the input voice sample. There are two modes of
                                                                        the best known and most popular, and these will be used in
operation that are related to the set of known voices- closed-
                                                                        this project.
set mode and open-set mode.

          In the closed-set mode, the system assumes that the
to-be- determined voice must come from the set of known
voices. Otherwise, the system is in open-set mode. The closed-
set speaker identification can be considered as a multiple-class
classification problem. In open-set mode, the speakers that do
not belong to the set of known voices are referred to as
impostors. This task can be used for forensic applications, e.g.,
speech evidence can be used to recognize the perpetrator’s
identity among several known suspects.

          In speaker verification, the goal is to determine                              Figure: 2 An example of speech signal.
whether a person is who he or she claims to be according to
his/her voice sample. This task is also known as voice                           The technique used for speech feature extraction
verification or authentication, speaker authentication, talker          make use of MFCC’s are based on the known variation of the
verification or authentication, and speaker detection. Speaker          human ear’s critical bandwidths with frequency filters spaced
segmentation and clustering techniques are also used in                 linearly at low frequencies and logarithmically at high
multiple-speaker recognition scenarios. In many speech                  frequencies have been used to capture the phonetically
recognition and it’s applications, it is often assumed that the         important characteristics of speech. This is expressed in the
speech from a particular individual is available for processing.        mel-frequency scale, which is linear frequency spacing below
When this is not the case, and the speech from the desired              1000 Hz and a logarithmic spacing above 1000 Hz. The
speaker is intermixed with other speakers, it is desired to             process of computing MFCCs is described in more detail next.
segregate the speech into segments from the individuals before
the recognition process commences. So the goal of this task is           V.       Mel-Frequency Cepstrum Coefficients Processor
to divide the input audio into homogeneous segments and then
label them via speaker identity. Recently, this task has                         A block diagram of the structure of an MFCC
received more attention due to increased inclusion of multiple-         processor is given in Figure The speech input is typically
speaker audio such as recorded news show or meetings in                 recorded at a sampling rate above 16000 Hz. This sampling
commonly used web searches and consumer electronic                      frequency was chosen to minimize the effects of aliasing in
devices. Speaker segmentation and clustering is one way to              the analog-to-digital conversion.
index audio archives so that to make the retrieval easier.
          According to the constraints placed on the speech
used to train and test the system, Automatic speaker
recognition can be further classified into text-dependent or
text-independent tasks.


      IV.      SPEECH FEATURE EXTRACTION
         The purpose of this module is to convert the speech
waveform to some type of parametric representation (at a
considerably lower information rate) for further analysis and                               Figure: 3 MFCC Processor.
processing. This is often referred as the signal-processing front
end. The speech signal is a slowly timed varying signal (it is
                                                                                                                                     53

                                                   All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                   Volume 1, Issue 4, June 2012


                VI.      Vector Quantization                            likelihood function of frame feature vector X associated with
                                                                        speaker S trained on data set . In this paper, we assume that
         It is a feature matching techniques used in speaker            each frame vector X consists of N subband features: X =
recognition. Here , VQ approach will be used, due to ease of            (x1,x2,..,xn), where xn represents the feature for the nth subband.
implementation and high accuracy. VQ is a process of                    We obtain by dividing the whole speech frequency-band into n
mapping vectors from a large vector space to a finite number            subbands, and then calculating the feature coefficients for each
of regions in that space. Each region is called a cluster and can       subband independently of the other subbands. The subband
be represented by its center called a codeword. The collection          feature framework has been used in speech recognition for
of all codewords is called a codebook. Figure shows a                   isolating local frequency-band corruption from spreading into
conceptual diagram to illustrate this recognition process. In the       the features of the other bands.
figure, only two speakers and two dimensions of the acoustic
space are shown.                                                                 The proposed approach for modeling noise includes
                                                                        two steps. The first step is to generate multiple copies of
                                                                        training set ø0, by introducing corruption of different
                                                                        characteristics into ø0. Primarily, we could add white noise at
                                                                        various signal-to-noise ratios (SNRs) to the clean training data
                                                                        to simulate the corruption. Assume that this leads to
                                                                        augmented training sets ø0, ø1,.., øl, where øl denotes the lth
                                                                        training set derived from with the inclusion of a certain noise
                                                                        condition. Then, new likelihood function for the test frame
                                                                        vector can be formed by combining the likelihood functions
                                                                        trained on the individual training sets

                                                                                  p(X / S)=Σ(l=0,L) p(X / S, øl) P(øl / S)   …….(1)
 Figure 4: conceptual diagram illustrating vector quantization
                    codebook formation.                                           where p(X / S, øl)is the likelihood function of frame
                                                                        vector X trained on set øl, and is the prior probability for the
One speaker can be discriminated from another based of the              occurrence of the noise condition , for speaker S. Equation (1)
location of centroids. In the training phase, a speaker-specific        is a multicondition model. A recognition system based on (1)
VQ codebook is generated for each known speaker by                      should have improved robustness to the noise conditions seen
clustering his/her training acoustic vectors. The result                in the training sets øl, as compared to a system based on p(X /
codewords (centroids) are shown in Figure by black circles              S, ø0).
and black triangles for speaker 1 and 2, respectively. The
distance from a vector to the closest codeword of a codebook                      The second step of the new approach is to make (1)
is called a VQ-distortion. In the recognition phase, an input           robust to noise conditions not fully matched by the training
utterance of an unknown voice is “vector-quantized” using               sets øl without assuming extra noise information. One way to
each trained codebook and the total VQ distortion is                    this is to ignore the heavily mismatched subbands and focus
computed. The speaker corresponding to the VQ codebook                  the score only on the matching subbands. Let X = (x1,x2,..,xn),
with smallest total distortion is identified.                           be a test frame vector and Xl c X be a subset in containing all
                                                                        the subband features corrupted at noise condition øl. Then,
          After the enrolment session, the acoustic vectors             using Xl in place of X as the test vector for each training noise
extracted from input speech of a speaker provide a set of               condition, (1) can be redefined as
training vectors. As described above, the next important step is
to build a speaker-specific VQ codebook for this speaker using          p(X / S)=Σ(l=0,L) p(Xl / S, øl) P(øl / S)      ……..(2)
those training vectors. There is a well-know algorithm, namely
LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a            where p(Xl / S, øl) is the marginal likelihood of the matching
set of L training vectors into a set of M codebook vectors.             feature subset Xl, derived from p(X / S, øl) with the
                                                                        mismatched subband features ignored to improve mismatch
                                                                        robustness between the test frame X and the training noise
                                                                        condition .
             VII.     SPEAKER MODELLING
          It deals with designing for speaker for recognition of
voice. It mainly consist of two phase training and testing phase                    VIII.     SPEAKER VERIFICATION
and both the phase mainly depends on feature extraction and
parameter matching.                                                              Speaker verification is the process of automatically
                                                                        verify who is speaking on the basis of individual information
        Let ø0 denote the training data set, containing clean           included in speech waves. This technique makes it possible to
speech data, for speaker S, and let p(X / S, ø0 ) represent the         use the speaker's voice to verify their identity and control

                                                                                                                                        54

                                                   All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                   Volume 1, Issue 4, June 2012


access to services such as voice dialing, banking by telephone,         Add push button here will perform add to database similarly
telephone shopping, database access services, information               remove push button will perform remove from database from
services, voice mail, security control for confidential                 database.
information areas, and remote access to computers.

          Speaker recognition can be classified into
identification and verification. Speaker identification is the
process of determining which registered speaker provides a
given speech. Speaker verification, on the other hand, is the
process of accepting or rejecting the identity claim of a
speaker. At the highest level, all speaker recognition systems
contain two main modules: feature extraction and feature
pattern matching.

          Feature extraction is the process that extracts a small
amount of data from the voice signal that can later be used to
represent each speaker. Feature matching involves the actual
procedure to identify the unknown speaker by comparing
extracted features from his/her voice input with the ones from
a set of known speakers.
                                                                        Snapshot 2. An example of adding voice named (IMRAN1)
          All speaker recognition systems have to serve two
                                                                        on top push button ,this to add the voice sample of respective
distinguishes phases. The first one is referred to the enrollment
                                                                        user. After this click the push button record file.
sessions or training phase while the second one is referred to
as the operation sessions or testing phase. In the training
phase, each registered speaker has to provide samples of their
speech so that the system can build or train a reference model
for that speaker. In case of speaker verification systems, in
addition, a speaker-specific threshold is also computed from
the training samples.

                         IX.        RESULTS
          The experiment conducted using three voice signal
of each person with different level noisy environment. After
passing the input speech through microphone speaker, Feature
vector transformation of input voice took place for the purpose
of testing and training. Snapshot of corresponding experiment
running and decision making for corresponding speaker                   Snapshot 3. A prompt of record voice signal is displayed
identification and verification has been displayed below.               asking permission for recording the voice of concerned user.
                                                                        Click here the push button yes to record the voice.




                                                                        Snapshot 4. After recording the voice, a prompt of playing
                                                                        voice signal is displayed. Click here the push button yes to
                                                                        play back the recorded voice.
Snapshot 1. Here four bush button is there named Add,
Remove, Recognized, Exit.

                                                                                                                                   55

                                                   All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                 International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                               Volume 1, Issue 4, June 2012




                                                                   Snapshot 8. A prompt containing Playing Voice Signal will
                                                                   appear. Click here the push button yes to play back your
                                                                   recorded voice.




Snapshot 5. Then time graph of voice signal and spectrum of
noise signal appears separately as two different figures.
Showing the speech signal varying with time and in other
concerned noise added to it.




                                                                   Snapshot 9. Then two separate figure of time graph of voice
                                                                   signal and spectrum signal appears.




Snapshot 6. Click here the push button recognize to
recognize the speaker and compare the frequency template of
speaker in data base with the present input speech signal.




                                                                   Snapshot 10. Fig for match of calculated and the best match
                                                                   stored codebook (MFCC) appear.

Snapshot 7. A prompt of Recode Voice Signal containing
push button speak now will appear. Click push button yes to
record your voice for further comparison.



                                                                                                                           56

                                              All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                    International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                  Volume 1, Issue 4, June 2012


                                                                                                    REFERENCES
                                                                           [1] Dr. Joseph Picone, Fundamentals of speech recognition,. a short
                                                                           course Institute for Signal and Information Processing. Department of
                                                                           Electrical and Computer Engineering., Mississippi State University.

                                                                           [2] Monson H Hayes, Digital Signal Processing,. Text book., Schaum’s
                                                                              outline.

                                                                           [3] D. A. Reynolds, .Experimental evaluation of features for robust
Snapshot 11. Shows to whom the voice matches and time                      speaker identification,. IEEE Trans. Speech Audio Processing, vol. 2,
taken for matching the two voice in seconds.                               pp. 639-643, Oct. 1994.

                                                                           [4] R. Mammone, X. Zhang and R. P. Ramachandran, .Robust speaker
                                                                           recognition - a feature-based approach,. IEEE Signal Processing
                                                                           Magazine, pp. 58-71, Sep. 1996.

                                                                           [5] H. A. Murthy, F. Beaufays, L. P. Heck and M. Weintraub, .Robust
                                                                           text-independent speaker identification over telephone channels,. IEEE
                                                                           Trans. Speech Audio Processing, vol. 7, pp. 554-568, Sep. 1999.

                                                                           [6] L. F. Lamel and J. L. Gauvain, .Speaker verification over the
Snapshot 12. Shows the decision that input voice not present               telephone,. Speech Commun., vol. 31, pp. 141-154, 2000.
in database hence speaker not recognized.
                                                                           [7] G. R. Doddington, et al., .The NIST speaker recognition evaluation -
                                                                           overview, methodology, systems, results, perspective., Speech
                                                                           Commun., vol. 31, pp. 225-254, 2000.
                   X.        CONCLUSION
                                                                           [8] Y.Kao,P.Rajashekaran and J.Baras, “Free-text speaker identification
     Speaker recognition can be used to verify one’s identity              over long distance telephone channel using phonetic segmentation”, in
when the interface favors the use of a telephone or                        proc.IEEE ICASSP 1992 pp II 177-II 180.
microphone. With proper expectations, planning and
education, speaker verification has already proven to be the
most natural yet very secure solution to verifying one’s
identity. Voice analysis technology has been around for years.
Applying it used to be tougher than rocket science. Now you                                    Mohammed Imdad N , received B.E in
can get all the benefits of advanced technology without all the                                Electronics and communication from
complexity and overhead of managing Gigabytes of voice                                         VTU, Belgaum. He is presently pursuing
reference data, dealing with advanced speech technology, and                                   M.Tech in computer science and
worrying about all the legal issues involved.                                                  engineering from VTU, Belgaum. topic
                                                                                               Working on the project to develop
    1.   This technique is been used for speaker recognition                                   speaker recognition in noisy environment
         and to identify the user using the speaker.                                           for his PG thesis under the guidance of
    2.   This technique makes it possible to use the speaker's                                 Dr. Shameem Akhtar N.
         voice to verify their identity and control access to
         services such as voice dialing, banking by telephone,
         telephone shopping, database access services,
         information services, voice mail, security control for
         confidential information areas.

                                                                                               Mohammad Imran Akhtar, received B.E
                                                                                               in Information technology from MG
Dr. Shameem Akhtar N received B.E in Computer Science                                          University, Kottyam. He completed
and Engineering from Gulbarga university and M.Tech in                                         M.Tech in digital communication and
Computer Science and Engineering from VTU, Belgaum and                                         Networking from UBDT College
the Ph.D degree in digital image processing From Gitam                                         of Engineering. He is an Assistant
University. She has more than 10 years of experience in                                        Professor    in    Electronics      and
teaching and research. She is life member of Indian society of                                 communication department of AITM
technical Education. She is an Assistant Professor in the                                      Bhatkal. His main research interest are
department of Computer Science and Engineering in KBN                                          include speech processing, image
college of Engineering.                                                                        processing.




                                                                                                                                               57

                                                 All Rights Reserved © 2012 IJARCSEE

Mais conteúdo relacionado

Mais procurados

Marathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesMarathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesIDES Editor
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition Systemijtsrd
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327IJMER
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAnkan Dutta
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
Real Time Speaker Identification System – Design, Implementation and Validation
Real Time Speaker Identification System – Design, Implementation and ValidationReal Time Speaker Identification System – Design, Implementation and Validation
Real Time Speaker Identification System – Design, Implementation and ValidationIDES Editor
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesisAnkita Jadhao
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
 
Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Alexander Decker
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 

Mais procurados (19)

Marathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesMarathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW Features
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Real Time Speaker Identification System – Design, Implementation and Validation
Real Time Speaker Identification System – Design, Implementation and ValidationReal Time Speaker Identification System – Design, Implementation and Validation
Real Time Speaker Identification System – Design, Implementation and Validation
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 

Destaque

HISTORIA DE JESUS DE NATZARET
HISTORIA DE JESUS DE NATZARETHISTORIA DE JESUS DE NATZARET
HISTORIA DE JESUS DE NATZARETthe niks
 
Infografía día del padre recomendaciones.v1.0
Infografía día del padre recomendaciones.v1.0Infografía día del padre recomendaciones.v1.0
Infografía día del padre recomendaciones.v1.0Francisco Javier Palomino
 
FusionCharts Suite XT Product Brochure
FusionCharts Suite XT Product BrochureFusionCharts Suite XT Product Brochure
FusionCharts Suite XT Product BrochureFusionCharts
 
Running a Successful Incentive Program
Running a Successful Incentive ProgramRunning a Successful Incentive Program
Running a Successful Incentive ProgramPaul Butler
 
Solid waste management system for organic waste management
Solid waste management system for organic waste managementSolid waste management system for organic waste management
Solid waste management system for organic waste managementvnsenviro_biotechq
 
Brexit wave 2 - TNS
Brexit wave 2 - TNSBrexit wave 2 - TNS
Brexit wave 2 - TNSRomain Brami
 
How 2-Tier ERP Can Benefit Your Business
How 2-Tier ERP Can Benefit Your BusinessHow 2-Tier ERP Can Benefit Your Business
How 2-Tier ERP Can Benefit Your BusinessBurCom Consulting Ltd.
 

Destaque (12)

HISTORIA DE JESUS DE NATZARET
HISTORIA DE JESUS DE NATZARETHISTORIA DE JESUS DE NATZARET
HISTORIA DE JESUS DE NATZARET
 
Slideshare brecha
Slideshare brechaSlideshare brecha
Slideshare brecha
 
Infografía día del padre recomendaciones.v1.0
Infografía día del padre recomendaciones.v1.0Infografía día del padre recomendaciones.v1.0
Infografía día del padre recomendaciones.v1.0
 
FusionCharts Suite XT Product Brochure
FusionCharts Suite XT Product BrochureFusionCharts Suite XT Product Brochure
FusionCharts Suite XT Product Brochure
 
Running a Successful Incentive Program
Running a Successful Incentive ProgramRunning a Successful Incentive Program
Running a Successful Incentive Program
 
Examen físico
Examen físico Examen físico
Examen físico
 
Revista avanzada
Revista avanzadaRevista avanzada
Revista avanzada
 
Solid waste management system for organic waste management
Solid waste management system for organic waste managementSolid waste management system for organic waste management
Solid waste management system for organic waste management
 
Brexit wave 2 - TNS
Brexit wave 2 - TNSBrexit wave 2 - TNS
Brexit wave 2 - TNS
 
new cv Operations Manager
new cv Operations Managernew cv Operations Manager
new cv Operations Manager
 
How 2-Tier ERP Can Benefit Your Business
How 2-Tier ERP Can Benefit Your BusinessHow 2-Tier ERP Can Benefit Your Business
How 2-Tier ERP Can Benefit Your Business
 
Why Salesforce is the best CRM
Why Salesforce is the best CRMWhy Salesforce is the best CRM
Why Salesforce is the best CRM
 

Semelhante a 52 57

Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summaryAditya Deshmukh
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identificationsipij
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 

Semelhante a 52 57 (20)

H42045359
H42045359H42045359
H42045359
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
P141omfccu
P141omfccuP141omfccu
P141omfccu
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep Learning
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summary
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification Techniques
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identification
 
50120140502007
5012014050200750120140502007
50120140502007
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...
 
De4201715719
De4201715719De4201715719
De4201715719
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
K010416167
K010416167K010416167
K010416167
 
19 ijcse-01227
19 ijcse-0122719 ijcse-01227
19 ijcse-01227
 

Mais de Ijarcsee Journal (20)

130 133
130 133130 133
130 133
 
122 129
122 129122 129
122 129
 
116 121
116 121116 121
116 121
 
109 115
109 115109 115
109 115
 
104 108
104 108104 108
104 108
 
99 103
99 10399 103
99 103
 
93 98
93 9893 98
93 98
 
88 92
88 9288 92
88 92
 
82 87
82 8782 87
82 87
 
78 81
78 8178 81
78 81
 
73 77
73 7773 77
73 77
 
65 72
65 7265 72
65 72
 
58 64
58 6458 64
58 64
 
52 57
52 5752 57
52 57
 
46 51
46 5146 51
46 51
 
41 45
41 4541 45
41 45
 
36 40
36 4036 40
36 40
 
28 35
28 3528 35
28 35
 
24 27
24 2724 27
24 27
 
19 23
19 2319 23
19 23
 

Último

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

52 57

  • 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 SPEAKER RECOGNITION IN NOISY ENVIRONMENT Mr. Mohammed Imdad N1 , Dr . Shameem Akhtar N1, Prof.Mohammad Imran Akhtar 2 1 Computer Science and Engineering department , KBN College of Engineering, Gulbarga, India 2 Electronics and Communication department , AITM , Bhatkal Abstract--- This paper investigates the problem of The speech signal conveys several levels of speaker identification and verification in noisy conditions, information. Primarily, the speech signal conveys the words or assuming that speech signals are corrupted by noise. This paper message being spoken, but on a secondary level, the signal describes a method that combines multi-condition model training also conveys information about the identity of the speaker. and missing-feature theory to model noise with unknown temporal-spectral characteristics. Introduction of such technique The area of speaker recognition is concerned with extracting is very useful since it remove avoids the problem of recognizing the identity of the person speaking an utterance. As speech voice and can also be implemented since here user is not required interaction with the computers become more pervasive in to remember his password login and hence no stilling chance. activities such as telephone transactions and information retrieval from speech databases, the utility of automatically Index Terms— Cepstrum, Missing Feature method, Multi- recognizing a speaker based on his vocal characteristics condition model training, Vector quantization increases. I. INTRODUCTION II. WORKING OF A SPEAKER RECOGNITION Spoken language is the most natural way used by SYSTEM humans to communicate information. The speech signal conveys several types of information. From the speech Like most pattern recognition problems, a speaker production point of view, the speech signal conveys linguistic recognition system can be partitioned into two modules: information (e.g., message and language) and speaker feature extraction and classification. The classification module information (e.g., emotional, regional, and physiological has two components: pattern matching and decision. The characteristics). From the speech perception point of view, it feature extraction module estimates a set of features from the also conveys information about the environment in which the speech signal that represent some speaker-specific speech was produced and transmitted. Even though this wide information. The speaker-specific information is the result of range of information is encoded in a complex form into the complex transformations occurring at different levels of the speech signal, humans can easily decode most of the speech production: semantic, phonologic, phonetic, and information. This speech technology has found wide acoustic. applications such as automatic dictation, voice command control, audio archive indexing and retrieval etc. Speaker recognition refers to two fields: Speaker Identification (SI) and Speaker Verification (SV). In speaker identification, the goal is to determine which one of group of known voices best matches the input voice sample. There are two tasks: text-dependent and text-independent speaker identification. In text dependent identification, the spoken phrase is known to the system whereas in the text independent case, the spoken phrase is unknown. Success in both identification tasks depends on extracting and modeling the Figure 1 : Generic speaker recognition system speaker dependent characteristics of the speech signal, which can effectively distinguish between talkers The pattern matching module is responsible for comparing the estimated features to the speaker models. There 52 All Rights Reserved © 2012 IJARCSEE
  • 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 are many types of pattern matching methods and called quasi-stationary). An example of speech signal is shown corresponding models used in speaker recognition [13]. Some in Figure 2. When examined over a sufficiently short period of of the methods include hidden Markov models (HMM), time (between 5 and 100 msec), its characteristics are fairly dynamic time warping (DTW), and vector quantization (VQ). stationary. However, over long periods of time (on the order of 1/5 seconds or more) the signal characteristic change to reflect the different speech sounds being spoken. Therefore, III. SPEAKER RECOGNITION PRINCIPLES short-time spectral analysis is the most common way to Depending on the application, the general area of characterize the speech signal. speaker recognition can be divided into three specific tasks: A wide range of possibilities exist for parametrically identification, detection/verification, and segmentation and representing the speech signal for the speaker recognition task, clustering. The goal of the speaker identification task is to such as Linear Prediction Coding (LPC), Mel-Frequency determine which speaker out of a group of known speakers Cepstrum Coefficients (MFCC), and others. MFCC is perhaps produces the input voice sample. There are two modes of the best known and most popular, and these will be used in operation that are related to the set of known voices- closed- this project. set mode and open-set mode. In the closed-set mode, the system assumes that the to-be- determined voice must come from the set of known voices. Otherwise, the system is in open-set mode. The closed- set speaker identification can be considered as a multiple-class classification problem. In open-set mode, the speakers that do not belong to the set of known voices are referred to as impostors. This task can be used for forensic applications, e.g., speech evidence can be used to recognize the perpetrator’s identity among several known suspects. In speaker verification, the goal is to determine Figure: 2 An example of speech signal. whether a person is who he or she claims to be according to his/her voice sample. This task is also known as voice The technique used for speech feature extraction verification or authentication, speaker authentication, talker make use of MFCC’s are based on the known variation of the verification or authentication, and speaker detection. Speaker human ear’s critical bandwidths with frequency filters spaced segmentation and clustering techniques are also used in linearly at low frequencies and logarithmically at high multiple-speaker recognition scenarios. In many speech frequencies have been used to capture the phonetically recognition and it’s applications, it is often assumed that the important characteristics of speech. This is expressed in the speech from a particular individual is available for processing. mel-frequency scale, which is linear frequency spacing below When this is not the case, and the speech from the desired 1000 Hz and a logarithmic spacing above 1000 Hz. The speaker is intermixed with other speakers, it is desired to process of computing MFCCs is described in more detail next. segregate the speech into segments from the individuals before the recognition process commences. So the goal of this task is V. Mel-Frequency Cepstrum Coefficients Processor to divide the input audio into homogeneous segments and then label them via speaker identity. Recently, this task has A block diagram of the structure of an MFCC received more attention due to increased inclusion of multiple- processor is given in Figure The speech input is typically speaker audio such as recorded news show or meetings in recorded at a sampling rate above 16000 Hz. This sampling commonly used web searches and consumer electronic frequency was chosen to minimize the effects of aliasing in devices. Speaker segmentation and clustering is one way to the analog-to-digital conversion. index audio archives so that to make the retrieval easier. According to the constraints placed on the speech used to train and test the system, Automatic speaker recognition can be further classified into text-dependent or text-independent tasks. IV. SPEECH FEATURE EXTRACTION The purpose of this module is to convert the speech waveform to some type of parametric representation (at a considerably lower information rate) for further analysis and Figure: 3 MFCC Processor. processing. This is often referred as the signal-processing front end. The speech signal is a slowly timed varying signal (it is 53 All Rights Reserved © 2012 IJARCSEE
  • 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 VI. Vector Quantization likelihood function of frame feature vector X associated with speaker S trained on data set . In this paper, we assume that It is a feature matching techniques used in speaker each frame vector X consists of N subband features: X = recognition. Here , VQ approach will be used, due to ease of (x1,x2,..,xn), where xn represents the feature for the nth subband. implementation and high accuracy. VQ is a process of We obtain by dividing the whole speech frequency-band into n mapping vectors from a large vector space to a finite number subbands, and then calculating the feature coefficients for each of regions in that space. Each region is called a cluster and can subband independently of the other subbands. The subband be represented by its center called a codeword. The collection feature framework has been used in speech recognition for of all codewords is called a codebook. Figure shows a isolating local frequency-band corruption from spreading into conceptual diagram to illustrate this recognition process. In the the features of the other bands. figure, only two speakers and two dimensions of the acoustic space are shown. The proposed approach for modeling noise includes two steps. The first step is to generate multiple copies of training set ø0, by introducing corruption of different characteristics into ø0. Primarily, we could add white noise at various signal-to-noise ratios (SNRs) to the clean training data to simulate the corruption. Assume that this leads to augmented training sets ø0, ø1,.., øl, where øl denotes the lth training set derived from with the inclusion of a certain noise condition. Then, new likelihood function for the test frame vector can be formed by combining the likelihood functions trained on the individual training sets p(X / S)=Σ(l=0,L) p(X / S, øl) P(øl / S) …….(1) Figure 4: conceptual diagram illustrating vector quantization codebook formation. where p(X / S, øl)is the likelihood function of frame vector X trained on set øl, and is the prior probability for the One speaker can be discriminated from another based of the occurrence of the noise condition , for speaker S. Equation (1) location of centroids. In the training phase, a speaker-specific is a multicondition model. A recognition system based on (1) VQ codebook is generated for each known speaker by should have improved robustness to the noise conditions seen clustering his/her training acoustic vectors. The result in the training sets øl, as compared to a system based on p(X / codewords (centroids) are shown in Figure by black circles S, ø0). and black triangles for speaker 1 and 2, respectively. The distance from a vector to the closest codeword of a codebook The second step of the new approach is to make (1) is called a VQ-distortion. In the recognition phase, an input robust to noise conditions not fully matched by the training utterance of an unknown voice is “vector-quantized” using sets øl without assuming extra noise information. One way to each trained codebook and the total VQ distortion is this is to ignore the heavily mismatched subbands and focus computed. The speaker corresponding to the VQ codebook the score only on the matching subbands. Let X = (x1,x2,..,xn), with smallest total distortion is identified. be a test frame vector and Xl c X be a subset in containing all the subband features corrupted at noise condition øl. Then, After the enrolment session, the acoustic vectors using Xl in place of X as the test vector for each training noise extracted from input speech of a speaker provide a set of condition, (1) can be redefined as training vectors. As described above, the next important step is to build a speaker-specific VQ codebook for this speaker using p(X / S)=Σ(l=0,L) p(Xl / S, øl) P(øl / S) ……..(2) those training vectors. There is a well-know algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a where p(Xl / S, øl) is the marginal likelihood of the matching set of L training vectors into a set of M codebook vectors. feature subset Xl, derived from p(X / S, øl) with the mismatched subband features ignored to improve mismatch robustness between the test frame X and the training noise condition . VII. SPEAKER MODELLING It deals with designing for speaker for recognition of voice. It mainly consist of two phase training and testing phase VIII. SPEAKER VERIFICATION and both the phase mainly depends on feature extraction and parameter matching. Speaker verification is the process of automatically verify who is speaking on the basis of individual information Let ø0 denote the training data set, containing clean included in speech waves. This technique makes it possible to speech data, for speaker S, and let p(X / S, ø0 ) represent the use the speaker's voice to verify their identity and control 54 All Rights Reserved © 2012 IJARCSEE
  • 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 access to services such as voice dialing, banking by telephone, Add push button here will perform add to database similarly telephone shopping, database access services, information remove push button will perform remove from database from services, voice mail, security control for confidential database. information areas, and remote access to computers. Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given speech. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. At the highest level, all speaker recognition systems contain two main modules: feature extraction and feature pattern matching. Feature extraction is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers. Snapshot 2. An example of adding voice named (IMRAN1) All speaker recognition systems have to serve two on top push button ,this to add the voice sample of respective distinguishes phases. The first one is referred to the enrollment user. After this click the push button record file. sessions or training phase while the second one is referred to as the operation sessions or testing phase. In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. In case of speaker verification systems, in addition, a speaker-specific threshold is also computed from the training samples. IX. RESULTS The experiment conducted using three voice signal of each person with different level noisy environment. After passing the input speech through microphone speaker, Feature vector transformation of input voice took place for the purpose of testing and training. Snapshot of corresponding experiment running and decision making for corresponding speaker Snapshot 3. A prompt of record voice signal is displayed identification and verification has been displayed below. asking permission for recording the voice of concerned user. Click here the push button yes to record the voice. Snapshot 4. After recording the voice, a prompt of playing voice signal is displayed. Click here the push button yes to play back the recorded voice. Snapshot 1. Here four bush button is there named Add, Remove, Recognized, Exit. 55 All Rights Reserved © 2012 IJARCSEE
  • 5. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 Snapshot 8. A prompt containing Playing Voice Signal will appear. Click here the push button yes to play back your recorded voice. Snapshot 5. Then time graph of voice signal and spectrum of noise signal appears separately as two different figures. Showing the speech signal varying with time and in other concerned noise added to it. Snapshot 9. Then two separate figure of time graph of voice signal and spectrum signal appears. Snapshot 6. Click here the push button recognize to recognize the speaker and compare the frequency template of speaker in data base with the present input speech signal. Snapshot 10. Fig for match of calculated and the best match stored codebook (MFCC) appear. Snapshot 7. A prompt of Recode Voice Signal containing push button speak now will appear. Click push button yes to record your voice for further comparison. 56 All Rights Reserved © 2012 IJARCSEE
  • 6. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 4, June 2012 REFERENCES [1] Dr. Joseph Picone, Fundamentals of speech recognition,. a short course Institute for Signal and Information Processing. Department of Electrical and Computer Engineering., Mississippi State University. [2] Monson H Hayes, Digital Signal Processing,. Text book., Schaum’s outline. [3] D. A. Reynolds, .Experimental evaluation of features for robust Snapshot 11. Shows to whom the voice matches and time speaker identification,. IEEE Trans. Speech Audio Processing, vol. 2, taken for matching the two voice in seconds. pp. 639-643, Oct. 1994. [4] R. Mammone, X. Zhang and R. P. Ramachandran, .Robust speaker recognition - a feature-based approach,. IEEE Signal Processing Magazine, pp. 58-71, Sep. 1996. [5] H. A. Murthy, F. Beaufays, L. P. Heck and M. Weintraub, .Robust text-independent speaker identification over telephone channels,. IEEE Trans. Speech Audio Processing, vol. 7, pp. 554-568, Sep. 1999. [6] L. F. Lamel and J. L. Gauvain, .Speaker verification over the Snapshot 12. Shows the decision that input voice not present telephone,. Speech Commun., vol. 31, pp. 141-154, 2000. in database hence speaker not recognized. [7] G. R. Doddington, et al., .The NIST speaker recognition evaluation - overview, methodology, systems, results, perspective., Speech Commun., vol. 31, pp. 225-254, 2000. X. CONCLUSION [8] Y.Kao,P.Rajashekaran and J.Baras, “Free-text speaker identification Speaker recognition can be used to verify one’s identity over long distance telephone channel using phonetic segmentation”, in when the interface favors the use of a telephone or proc.IEEE ICASSP 1992 pp II 177-II 180. microphone. With proper expectations, planning and education, speaker verification has already proven to be the most natural yet very secure solution to verifying one’s identity. Voice analysis technology has been around for years. Applying it used to be tougher than rocket science. Now you Mohammed Imdad N , received B.E in can get all the benefits of advanced technology without all the Electronics and communication from complexity and overhead of managing Gigabytes of voice VTU, Belgaum. He is presently pursuing reference data, dealing with advanced speech technology, and M.Tech in computer science and worrying about all the legal issues involved. engineering from VTU, Belgaum. topic Working on the project to develop 1. This technique is been used for speaker recognition speaker recognition in noisy environment and to identify the user using the speaker. for his PG thesis under the guidance of 2. This technique makes it possible to use the speaker's Dr. Shameem Akhtar N. voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas. Mohammad Imran Akhtar, received B.E in Information technology from MG Dr. Shameem Akhtar N received B.E in Computer Science University, Kottyam. He completed and Engineering from Gulbarga university and M.Tech in M.Tech in digital communication and Computer Science and Engineering from VTU, Belgaum and Networking from UBDT College the Ph.D degree in digital image processing From Gitam of Engineering. He is an Assistant University. She has more than 10 years of experience in Professor in Electronics and teaching and research. She is life member of Indian society of communication department of AITM technical Education. She is an Assistant Professor in the Bhatkal. His main research interest are department of Computer Science and Engineering in KBN include speech processing, image college of Engineering. processing. 57 All Rights Reserved © 2012 IJARCSEE