Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Tribhuvan University
Institute of Engineering
Pulchowk Campus
Department of Electronics and Computer Engineering

MAJOR PROJECT FINAL PRESENTATION :

TEXT PROMPTED REMOTE
SPEAKER AUTHENTICATION
Project Supervisor : Project Members:
Dr. Subarna Shakya Ganesh Tiwari (75010)
Associate Professor Madhav Pandey(75014)
Manoj Shrestha(75018)

Internal Examiner: External Examiner
Er. Manoj Ghimire Er. Bimal Acharya

INTRODUCTION

 Voice biometric system
 User login

 Text-Prompted system
 Claimant is asked to speak a prompted(random) text
 Speech and Speaker Recognition

 Why Text prompted ?
 Playback attack

OUR SYSTEM

 Feature : MFCC

 Modeling and Classifications : both statistical

 GMM - Speaker Modeling :

 HMM/VQ - Speech Modeling :

PROPERTIES OF SPEECH SIGNAL

 Carries both Speech Content and Speaker identity

 What makes Speech Signal Unique ?
 Each phoneme resonates at its own fundamental frequency
and harmonics of it
 Studied over short period : short time spectral analysis

 What is Speaker Dependent information
 Fundamental frequency, primarily
 function of the dimensions and tension of the vocal chords
 size and shape of the mouth, throat, nose, and teeth

 Studied over long period : all the variations from that speaker

UNIQUENESS IN PHONEME
Phoneme /ah/

0.15

0.1

0.05

0
Amplitude

-0.05

-0.1

-0.15
Phoneme /i:/
-0.2
0 500 1000 1500 2000 2500
Samples

Pre-Processing and Feature Extraction

PREPROCESSING : STEPS

1)Silence Removal

1

0.5

0

-0.5

-1
0 1 2 3 4 5 6 7 8 9
4
x 10

Silence Signal
1

0.5

0

-0.5

Silence Removed -1
0 0.5 1 1.5 2 2.5 3 3.5 4
4

PREPROCESSING :STEPS (CONTD..)

1)Silence Removal 2)Pre-Emphasis
0.05

0.04

Suppressed high
Frequencies
0.03
|Y(f)|

0.02

0.01

0
0 2000 4000 6000 8000 10000 12000
Frequency (Hz)

-3
x 10
5

4
Boosted high
Frequencies
3
|Y(f)|

2

1

0
0 2000 4000 6000 8000 10000 12000
Frequency (Hz)


3)Framing
1)Silence Removal2)Pre-Emphasis

 50% overlapped, 23ms


1)Silence Removal2)Pre-Emphasis3)Framing 4)Windowing
0.05

0.04

0.03

0.02

0.01

0

-0.01 0.04

-0.02
0.03
-0.03

-0.04 0.02

-0.05
0 200 400 600 800 1000 1200 0.01

0

-0.01
1
Hamming Window -0.02
0.9

0.8 -0.03
0.7
-0.04
0.6 0 200 400 600 800 1000 1200

0.5

0.4

0.3

0.2

0.1 Windowed Signal
0 10 20 30 40 50 60

Hamming Window

FEATURE EXTRACTION

 MFCC : Mel Filter Cepstral Coefficients

 Perceptual approach
 Human Ear processes audio signal in Mel scale

 Mel scale : linear up to 1KHz and logarithmic after
1KHz

MFCC EXTRACTION: (CONTD..)
 Steps :

FFT  Mel Filter  Log  DCT  CMS

Mel Filter Bank

 Mel Filter : 12
 Filtering of absolute fft coefficients using triangular filter bank in
Mel scale

 MFCC gives distribution of energy acc. to filters in Mel
frequency band

EXTRA FEATURES :ENERGY AND DELTAS

 For achieving high recognition rate

 A Energy Feature

 Delta and Delta-Delta

 delta velocity feature
Co-articulation
 double delta acceleration feature

COMPOSITION OF FEATURE VECTOR

12 MFCC Features
12 Δ MFCC
12 Δ Δ MFCC
1 Energy Feature
1 Δ Energy
1 Δ Δ Energy

 39 Features from each frame

Speech Recognition/Verification by

HMM/VQ

HIDDEN MARKOV MODEL (HMM)

 HMM is the extension of Markov Process

 Markov Process consist of observable states

 HMM has hidden states and observable symbols
per states

 HMM is the stochastic model

HMM (CONTD…)

 Parameters

1) The initial state distribution (π)
2) State transition probability distribution (A)
3) Observation symbol probability distribution (B)

 The HMM Model   (A,B, )



EXAMPLE:
PRONUNCIATION MODEL OF WORD TOMATO

  (A,B, )

HMM IMPLEMENTATION

 Feature Vector  observation symbols , 256

 Phonemes hidden states, 6

 Left to right HMM

 Discrete Hidden Markov Model (DHMM) with
Vector Quantization (VQ) technique

Speaker Recognition/Verification by

GMM

SPEAKER MODELING (GMM)

 Gaussian Mixture Model
 Parametric probability density function
 Based on soft clustering technique
 Mixture of Gaussian components

  = (�� , �� , �� )

SPEAKER MODEL TRAINING

 Estimate the model parameters
 Expectation Maximization algorithm

SPEAKER VERIFICATION

 Based on likelihood ratio

��ℎ�� ℎ�� ′ ��
=
��ℎ�� ′ ��

TOOLS USED

 Languages:
 Adobe Flex
 Java
 Blaze DS for RPC

 Servers:
 Apache Tomcat
 MySQL

 Versioning
 Tortoise SVN

APPLICATION AREAS

 Telephone transaction
 Telephone credit card purchase,
 Telephone stock trading

 Access control
 Physical facilities
 Computer networks
 Information retrieval
 Customers information

 Forensics
 Voice sample matching

LIMITATION AND FUTURE ENHANCEMENT

 Noise reduction

 Training on more data

 Combine with
 other features
 other classification methods

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Semelhante a Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide (20)

Último

Último (20)

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide