OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
1. *
Under guidance of
Dr. G. Pradhan
NIT PATNA (ECE dept.)
NAME-PAMMI KUMARI
M.TECH 2nd yr (ECE dept.)
ROLL NO.-1329005
2. • Introduction
• Summary of Literature review
• Issues in existing speaker verification systems
• Motivation for the present work
• Baseline speaker verification system
• Experimental results
• Proposal for future work
*
3. To develop voice password based speaker verification
To study impact of text-mismatch on the performance of voice
password based speaker verification system
Develop a voice password based speaker verification system in
text-independent mode
Explore method to model speaker information in limited data
condition
Most of the application where speech signal of short duration
used around 3-5ms, but Speaker verification system provide
poor performance for short duration speech signal
This degradation of performance is due to phonetic variability
between training and testing speech data
4. SPEAKER VERIFICATION: The speaker verification is a
process of verifying the identity of the claimant . It performs
one-to-one comparison between a newly input voiceprint and the
voiceprint for the claimed identity that is stored in the database.
*
Fig :-Block diagram of speaker
verification system
Input
Speech Similarity
Feature
Extraction
Verificatio
n result
Speaker
ID(#M)
Reference
model
(Speaker
#M)
Threshold
Decision
5. *
Training Reference model
Speech
Identity claim
Testing
Speech R
Accept/reject
Pre-
processing
Feature
extraction
Model
Building
Pre-
processing
Feature
extraction comparison
Decision
logic
Fig: Voice password speaker verification system
6. Cont….
• when an identity claim is made by a speaker, the
speech data is compared with respect to the model of
the speaker whose identity is claimed.
• The concept of threshold is used to come up with the
decision.
• If the similarity of the test speech data to the target
model is below the threshold ,the speaker is accepted.
• This process involves a binary decision (accept/reject)
about the claimed identity regardless of the population
size.
• Hence, the performance of the verification system
does not depend on the size of the population.
7. • In the first stage, pre-processing and feature
extraction is performed over a database of
speakers.
• The second stage is to generate models, where
vectors representing speaker specific
characteristic are obtained, this leads to the
feature vectors.
• The third stage is decision, which accepts or
rejects the claimed identity of a speaker.
*
8. Basic block diagram of a biometric system
PRE-
PROCESSING
FEATHER
EXTRACTION
APPLICATION
DEVICE
TEMPLATE
GENERATOR MATCHER
STORED
TEMPLATE
SENSOR
9. *
Text-dependent speaker verification-In this, speaker
system is based on the utterance of a fixed
predetermined phrases.
Text-independent speaker verification-In this, the reference
(what are spoken in training) & the test (what are uttered in
actual use) utterance may have completely different content
is text-independent.
10. *Literature
• Research in the field of speaker recognition was initially
carried out in 1950s in Bell laboratories using isolated digites
[1].
• In 2000 most of the research was describe the major elements
of Gaussian mixture model (GMM)-based speaker verification
system used successfully in several NIST Speaker Recognition
Evaluations(SREs).
• 1960-1990 most of the research was focused on extraction of
speaker specific information from the speech data, and
development of text dependent speaker verification system.
11. • In 1990-2005 the speaker recognition method
shifted from template based pattern matching to
statistical modeling. Different statistical
modeling method like GMM and GMM-UBM are
proposed.
• 2005- 2014 most of the research was focused on
compensation of mismatches and development of
practical verification systems. Different
compensation methods like i-vectors and PLDA
are proposed
1. K. H. Davis, et. al., “Automatic recognition of spoken digits,”
J.A.S.A., 24 (6), pp. 637-642, 1952.
*
12. • In the speech analysis stage, through the
techniques have been developed to improve the
speaker verification performance, no particular
analysis techniques is specially meant for limited
data condition.
• The use of segmental analysis under limited data
condition provides few feature vectors which
leads to poor speaker models leads to degradation
of performance.
*
13. • Most of the application where speech signal of short
duration used around 3-5ms, but Speaker verification
system provide poor performance for short duration
speech signal
• This degradation of performance is due to phonetic
variability between training and testing speech data
• The phonetic variability may be reduced by artificially
generating multiple utterance.
• Most of the SV system develop score normalization
using on cohort centric normalization. The speaker
centric score normalization may provide better result.
*
14. • For Baseline speaker verification the
following parameter are used
VAD threshold is taken 0.1 of average
energy
Baseline uses MFCC features
Feature vector: It uses 39 dimension
feature vector and 20ms frame size with
shift 2ms.
Modeling: GMM
GMM size: 8, 16, 32, 64.
*
17. • Extraction of feature to reduce the impact
of phonetic variability.
• Different residue of behavioral feature may
be extracted in addition to MFCC for
speaker verification.
• In this project we considered GMM
modeling technique in next work many
other technique may be used like i-vector.
*