SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Introduction Proposed recontruction technique Experimental results Conclusions
MMSE Feature Reconstruction Based on an
Occlusion Model for Robust ASR
J. A. Gonz´alez, A. M. Peinado, and A. M. G´omez
University of Granada
Signal Processing, Multimedia Transmission and Speech/Audio Technologies Group (SigMAT)
IberSpeech 2012
Introduction Proposed recontruction technique Experimental results Conclusions
Outline
1 Introduction
2 Proposed recontruction technique
3 Experimental results
4 Conclusions
Introduction Proposed recontruction technique Experimental results Conclusions
Introduction
The performance of ASR systems deteriorates significantly in
the presence of background noise.
For this reason, noise robustness is becoming a major issue in
current ASR systems embedded in mobile platforms.
Traditional approaches for noise robustness:
× Model adaptation modifies the HMM parameters ⇒ more
flexible but less efficient.
Feature compensation denoises the features used for speech
recognition.
PROPOSAL: novel compensation technique derived from an
speech occlusion model.
Introduction Proposed recontruction technique Experimental results Conclusions
Occlusion Model (I)
The mismatch function for the log-Mel domain can be
approximated by:
y ≈ log(ex
+ en
)
This model can be rewritten as,
y ≈ max(x, n) + ε(x, n)
where
ε(x, n) = log 1 + emin(x,n)−max(x,n)
" (x ; n )" (x ; n )
23
15
7
0.6
0
0.5 1.0 1.5 2.0 2.5 3.0
Time (s)
0.4
0.2
12
0
Clean
23
15
7
Noisy (0 dB)
23
15
7
12
5
Log-Max
23
15
7
12
5
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
-0.4 -0.2 0 0.2 0.4 0.6
Probability
ε(x, n)
Distribution of ε(x, n) in Aurora2
Introduction Proposed recontruction technique Experimental results Conclusions
Occlusion Model (II)
Hence, ε(x, n) can be safely ignored from the above model:
y ≈ max(x, n) (1)
This equation will be referred to as the speech occlusion
model (a.k.a. Log-Max approximation).
According to (1), the noisy feature vector can be rearranged
into y = (yr , yu):
Reliable features (xr ≈ yr ), i.e. speech is unaffected.
Unreliable features (−∞ ≤ xu ≤ yu): speech is masked by
noise.
PROBLEM: How do we estimate the unreliable/reliable
features?
Missing data techniques (MDT) use missing-data masks.
Our proposal uses noises estimates to this end.
Introduction Proposed recontruction technique Experimental results Conclusions
Proposed Technique: Visual Analogy
MMSE
estimation
ASR
Noisy input
Noise estimate
Source model
Enhanced input
Introduction Proposed recontruction technique Experimental results Conclusions
MMSE Estimation of Occluded Features (I)
The MMSE estimate of the clean feature vector is given by:
ˆx = E[x|y] =
x
xp(x|y)dx
Source model: clean speech GMM λX with M gaussians.
Noise estimates: p(nt|λN,t) = NN(nt; µN,t, ΣN,t).
Therefore, the MMSE estimate can be finally expressed as
(time dependency is omitted),
ˆx =
M
k=1
P(k|y, λX , λN)
x
xp(x|y, k, λX , λN)dx
E[x|y,k,λX ,λN ]
Introduction Proposed recontruction technique Experimental results Conclusions
MMSE Estimation of Occluded Features (II)
Posterior computation: the corrupted speech likelihood can
be computed as the product of the following probabilities:
p(yi |k) = pX (yi |k)CN(yi ) + pN(yi )CX (yi |k)
Similarly, the partial estimates are given by,
E [xi |yi , k] = wk
i yi + (1 − wk
i )˜µk
X,i
wk
i : speech presence probability, i.e. wk
i = P(xi > ni |k).
yi : clean speech estimate for high SNRs.
˜µk
X,i : estimate for the case of total occlusion ⇒ xi ∈ (−∞, yi ].
Introduction Proposed recontruction technique Experimental results Conclusions
Comparison with related techniques
Missing-data techniques (MDT) are also based on the speech
occlusion model.
MDT assume that a priori knowledge about the feature
reliability is provided by a missing-data mask m.
m: either binary (mi = 1 ⇔ yi reliable, mi = 0 ⇔ yi
unreliable) or soft (mi ∈ [0, 1]).
Then, the terms required by the MMSE estimator are:
p(yi |k) = mi pX (yi |k)C(yi ) + (1 − mi )pN(yi )CX (yi |k)
E[xi |yi , k] = mi yi + (1 − mi )˜µk
X,i
Introduction Proposed recontruction technique Experimental results Conclusions
Experimental setup
The proposed technique is evaluated on the Aurora2 and
Aurora4 databases.
Speech features: 13 MFCCs + ∆ + ∆2 + CMN.
Clean speech GMM with 256 components and diagonal
covariances.
Noise estimation: linear interpolation of the means for the N
first and last frames (NAurora2 = 20, NAurora4 = 40). ΣN fixed
for the whole utterance.
Mask computation:
Binary masks: SNR estimates are thresholded (0 dB).
Soft masks: see equation in the next slide.
Introduction Proposed recontruction technique Experimental results Conclusions
Illustrative example
Clean
Noisy (0 dB)
Noise
Enhanced speech
Estimated mask
23
15
7
12
0
23
15
7
12
5
23
15
7
12
5
23
15
7
11
3
23
15
7
1
0
0.5 1.0 1.5 2.0 2.5 3.0
Time (s)
Utterance (eight six zero one one six two) corrupted by
subway noise at 0 dB.
Feature reliability mask is computed as follows:
mi =
M
k=1
P (k|y, λX , λN) wk
i (2)
Introduction Proposed recontruction technique Experimental results Conclusions
Aurora2 results
10
20
30
40
50
60
70
80
90
100
-5 0 5 10 15 20 Clean
WAcc(%)
SNR (dB)
Baseline
Oracle
BMD
SMD
SRO
Average results for Sets A, B, and C.
Oracle: MDT with perfect knowledge of the feature reliability.
MDT: BMD (binary masks) and SMD (soft masks).
SRO (proposed technique) outperforms BMD and SMD.
Introduction Proposed recontruction technique Experimental results Conclusions
Aurora4 results
T-01 T-02 T-03 T-04 T-05 T-06 T-07 Avg.
Baseline 87.69 75.30 53.24 53.15 46.80 56.36 45.38 59.70
Oracle 87.69 86.74 84.46 84.44 83.19 85.90 82.38 84.97
BMD 86.96 80.78 58.47 52.74 59.63 56.14 61.42 65.16
SMD 87.52 83.65 66.62 63.78 63.48 69.19 65.31 71.36
SRO 87.54 83.28 69.23 64.49 64.88 70.63 66.93 72.43
Oracle is the recognition upper bound that can be achieved.
BMD vs. SMD: soft masks are better.
SRO vs. SMD: quite similar to each other, but SRO is better.
Introduction Proposed recontruction technique Experimental results Conclusions
Conclusions
We have proposed a novel technique for log-Mel feature
reconstruction.
The technique resembles previous work on missing-data
reconstruction.
Contrary to MDT, our proposal requires no missing-data
mask, but noise estimates.
The proposed technique can be alternatively considered as a
mask estimation technique.
The experimental results have shown the superiority of our
proposal over MDT.
Introduction Proposed recontruction technique Experimental results Conclusions
Thank you!

Mais conteúdo relacionado

Mais procurados

SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES Toru Tamaki
 
Isspit presentation
Isspit presentationIsspit presentation
Isspit presentationELVINUGONNA
 
Image Processing
Image ProcessingImage Processing
Image Processingtijeel
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Chapter 3 Image Processing: Basic Transformation
Chapter 3 Image Processing:  Basic TransformationChapter 3 Image Processing:  Basic Transformation
Chapter 3 Image Processing: Basic TransformationVarun Ojha
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
Inter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurInter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurniveditJain
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationVarun Ojha
 
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalFisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalIonut Mironica
 
Region filling
Region fillingRegion filling
Region fillinghetvi naik
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...cscpconf
 
Image degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafImage degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafMD Naseem Ashraf
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Yusuke Uchida
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
04 1 - frequency domain filtering fundamentals
04 1 - frequency domain filtering fundamentals04 1 - frequency domain filtering fundamentals
04 1 - frequency domain filtering fundamentalscpshah01
 

Mais procurados (20)

SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
 
Isspit presentation
Isspit presentationIsspit presentation
Isspit presentation
 
Image Processing
Image ProcessingImage Processing
Image Processing
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Image transforms
Image transformsImage transforms
Image transforms
 
Chapter 3 Image Processing: Basic Transformation
Chapter 3 Image Processing:  Basic TransformationChapter 3 Image Processing:  Basic Transformation
Chapter 3 Image Processing: Basic Transformation
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Image transforms
Image transformsImage transforms
Image transforms
 
Inter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurInter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT Jodhpur
 
Image restoration and reconstruction
Image restoration and reconstructionImage restoration and reconstruction
Image restoration and reconstruction
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image Transformation
 
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalFisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
 
Region filling
Region fillingRegion filling
Region filling
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
 
Aw4101277280
Aw4101277280Aw4101277280
Aw4101277280
 
PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
 
Image degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafImage degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem Ashraf
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
04 1 - frequency domain filtering fundamentals
04 1 - frequency domain filtering fundamentals04 1 - frequency domain filtering fundamentals
04 1 - frequency domain filtering fundamentals
 

Destaque

Channel Equalisation
Channel EqualisationChannel Equalisation
Channel EqualisationPoonan Sahoo
 
Performance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless System
Performance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless SystemPerformance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless System
Performance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless SystemAM Publications
 
Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)
Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)
Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)Alex J Mitchell
 

Destaque (6)

Ofdm
OfdmOfdm
Ofdm
 
Final ppt
Final pptFinal ppt
Final ppt
 
Channel Equalisation
Channel EqualisationChannel Equalisation
Channel Equalisation
 
Performance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless System
Performance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless SystemPerformance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless System
Performance Analysis of 2x2 MIMO for OFDM-DSSS Based Wireless System
 
Ofdm
OfdmOfdm
Ofdm
 
Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)
Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)
Cardiff07 - Is It Time for the MMSE to Retire (Oct 2007)
 

Semelhante a Iberspeech2012

A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionA New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
 
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelIDES Editor
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Performance of MMSE Denoise Signal Using LS-MMSE Technique
Performance of MMSE Denoise Signal Using LS-MMSE TechniquePerformance of MMSE Denoise Signal Using LS-MMSE Technique
Performance of MMSE Denoise Signal Using LS-MMSE TechniqueIJMER
 
Speckle noise reduction using hybrid tmav based fuzzy filter
Speckle noise reduction using hybrid tmav based fuzzy filterSpeckle noise reduction using hybrid tmav based fuzzy filter
Speckle noise reduction using hybrid tmav based fuzzy filtereSAT Publishing House
 
Image Fusion Based On Wavelet And Curvelet Transform
Image Fusion Based On Wavelet And Curvelet TransformImage Fusion Based On Wavelet And Curvelet Transform
Image Fusion Based On Wavelet And Curvelet TransformIOSR Journals
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...
Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...
Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...CSCJournals
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfBala Murugan
 
Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...IJECEIAES
 
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...ijwmn
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationIRJET Journal
 
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive SensingIRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive SensingIRJET Journal
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdfIjictTeam
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Hoydis massive mimo and het nets benefits and challenges (1)
Hoydis   massive mimo and het nets benefits and challenges (1)Hoydis   massive mimo and het nets benefits and challenges (1)
Hoydis massive mimo and het nets benefits and challenges (1)nwiffen
 

Semelhante a Iberspeech2012 (20)

A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionA New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
 
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
 
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform DomainReal Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Performance of MMSE Denoise Signal Using LS-MMSE Technique
Performance of MMSE Denoise Signal Using LS-MMSE TechniquePerformance of MMSE Denoise Signal Using LS-MMSE Technique
Performance of MMSE Denoise Signal Using LS-MMSE Technique
 
Speckle noise reduction using hybrid tmav based fuzzy filter
Speckle noise reduction using hybrid tmav based fuzzy filterSpeckle noise reduction using hybrid tmav based fuzzy filter
Speckle noise reduction using hybrid tmav based fuzzy filter
 
Image Fusion Based On Wavelet And Curvelet Transform
Image Fusion Based On Wavelet And Curvelet TransformImage Fusion Based On Wavelet And Curvelet Transform
Image Fusion Based On Wavelet And Curvelet Transform
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoising
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...
Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...
Noisy Speech Enhancement Using Soft Thresholding on Selected Intrinsic Mode F...
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
 
Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...
 
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition application
 
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive SensingIRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Hoydis massive mimo and het nets benefits and challenges (1)
Hoydis   massive mimo and het nets benefits and challenges (1)Hoydis   massive mimo and het nets benefits and challenges (1)
Hoydis massive mimo and het nets benefits and challenges (1)
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Iberspeech2012

  • 1. Introduction Proposed recontruction technique Experimental results Conclusions MMSE Feature Reconstruction Based on an Occlusion Model for Robust ASR J. A. Gonz´alez, A. M. Peinado, and A. M. G´omez University of Granada Signal Processing, Multimedia Transmission and Speech/Audio Technologies Group (SigMAT) IberSpeech 2012
  • 2. Introduction Proposed recontruction technique Experimental results Conclusions Outline 1 Introduction 2 Proposed recontruction technique 3 Experimental results 4 Conclusions
  • 3. Introduction Proposed recontruction technique Experimental results Conclusions Introduction The performance of ASR systems deteriorates significantly in the presence of background noise. For this reason, noise robustness is becoming a major issue in current ASR systems embedded in mobile platforms. Traditional approaches for noise robustness: × Model adaptation modifies the HMM parameters ⇒ more flexible but less efficient. Feature compensation denoises the features used for speech recognition. PROPOSAL: novel compensation technique derived from an speech occlusion model.
  • 4. Introduction Proposed recontruction technique Experimental results Conclusions Occlusion Model (I) The mismatch function for the log-Mel domain can be approximated by: y ≈ log(ex + en ) This model can be rewritten as, y ≈ max(x, n) + ε(x, n) where ε(x, n) = log 1 + emin(x,n)−max(x,n) " (x ; n )" (x ; n ) 23 15 7 0.6 0 0.5 1.0 1.5 2.0 2.5 3.0 Time (s) 0.4 0.2 12 0 Clean 23 15 7 Noisy (0 dB) 23 15 7 12 5 Log-Max 23 15 7 12 5 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 -0.4 -0.2 0 0.2 0.4 0.6 Probability ε(x, n) Distribution of ε(x, n) in Aurora2
  • 5. Introduction Proposed recontruction technique Experimental results Conclusions Occlusion Model (II) Hence, ε(x, n) can be safely ignored from the above model: y ≈ max(x, n) (1) This equation will be referred to as the speech occlusion model (a.k.a. Log-Max approximation). According to (1), the noisy feature vector can be rearranged into y = (yr , yu): Reliable features (xr ≈ yr ), i.e. speech is unaffected. Unreliable features (−∞ ≤ xu ≤ yu): speech is masked by noise. PROBLEM: How do we estimate the unreliable/reliable features? Missing data techniques (MDT) use missing-data masks. Our proposal uses noises estimates to this end.
  • 6. Introduction Proposed recontruction technique Experimental results Conclusions Proposed Technique: Visual Analogy MMSE estimation ASR Noisy input Noise estimate Source model Enhanced input
  • 7. Introduction Proposed recontruction technique Experimental results Conclusions MMSE Estimation of Occluded Features (I) The MMSE estimate of the clean feature vector is given by: ˆx = E[x|y] = x xp(x|y)dx Source model: clean speech GMM λX with M gaussians. Noise estimates: p(nt|λN,t) = NN(nt; µN,t, ΣN,t). Therefore, the MMSE estimate can be finally expressed as (time dependency is omitted), ˆx = M k=1 P(k|y, λX , λN) x xp(x|y, k, λX , λN)dx E[x|y,k,λX ,λN ]
  • 8. Introduction Proposed recontruction technique Experimental results Conclusions MMSE Estimation of Occluded Features (II) Posterior computation: the corrupted speech likelihood can be computed as the product of the following probabilities: p(yi |k) = pX (yi |k)CN(yi ) + pN(yi )CX (yi |k) Similarly, the partial estimates are given by, E [xi |yi , k] = wk i yi + (1 − wk i )˜µk X,i wk i : speech presence probability, i.e. wk i = P(xi > ni |k). yi : clean speech estimate for high SNRs. ˜µk X,i : estimate for the case of total occlusion ⇒ xi ∈ (−∞, yi ].
  • 9. Introduction Proposed recontruction technique Experimental results Conclusions Comparison with related techniques Missing-data techniques (MDT) are also based on the speech occlusion model. MDT assume that a priori knowledge about the feature reliability is provided by a missing-data mask m. m: either binary (mi = 1 ⇔ yi reliable, mi = 0 ⇔ yi unreliable) or soft (mi ∈ [0, 1]). Then, the terms required by the MMSE estimator are: p(yi |k) = mi pX (yi |k)C(yi ) + (1 − mi )pN(yi )CX (yi |k) E[xi |yi , k] = mi yi + (1 − mi )˜µk X,i
  • 10. Introduction Proposed recontruction technique Experimental results Conclusions Experimental setup The proposed technique is evaluated on the Aurora2 and Aurora4 databases. Speech features: 13 MFCCs + ∆ + ∆2 + CMN. Clean speech GMM with 256 components and diagonal covariances. Noise estimation: linear interpolation of the means for the N first and last frames (NAurora2 = 20, NAurora4 = 40). ΣN fixed for the whole utterance. Mask computation: Binary masks: SNR estimates are thresholded (0 dB). Soft masks: see equation in the next slide.
  • 11. Introduction Proposed recontruction technique Experimental results Conclusions Illustrative example Clean Noisy (0 dB) Noise Enhanced speech Estimated mask 23 15 7 12 0 23 15 7 12 5 23 15 7 12 5 23 15 7 11 3 23 15 7 1 0 0.5 1.0 1.5 2.0 2.5 3.0 Time (s) Utterance (eight six zero one one six two) corrupted by subway noise at 0 dB. Feature reliability mask is computed as follows: mi = M k=1 P (k|y, λX , λN) wk i (2)
  • 12. Introduction Proposed recontruction technique Experimental results Conclusions Aurora2 results 10 20 30 40 50 60 70 80 90 100 -5 0 5 10 15 20 Clean WAcc(%) SNR (dB) Baseline Oracle BMD SMD SRO Average results for Sets A, B, and C. Oracle: MDT with perfect knowledge of the feature reliability. MDT: BMD (binary masks) and SMD (soft masks). SRO (proposed technique) outperforms BMD and SMD.
  • 13. Introduction Proposed recontruction technique Experimental results Conclusions Aurora4 results T-01 T-02 T-03 T-04 T-05 T-06 T-07 Avg. Baseline 87.69 75.30 53.24 53.15 46.80 56.36 45.38 59.70 Oracle 87.69 86.74 84.46 84.44 83.19 85.90 82.38 84.97 BMD 86.96 80.78 58.47 52.74 59.63 56.14 61.42 65.16 SMD 87.52 83.65 66.62 63.78 63.48 69.19 65.31 71.36 SRO 87.54 83.28 69.23 64.49 64.88 70.63 66.93 72.43 Oracle is the recognition upper bound that can be achieved. BMD vs. SMD: soft masks are better. SRO vs. SMD: quite similar to each other, but SRO is better.
  • 14. Introduction Proposed recontruction technique Experimental results Conclusions Conclusions We have proposed a novel technique for log-Mel feature reconstruction. The technique resembles previous work on missing-data reconstruction. Contrary to MDT, our proposal requires no missing-data mask, but noise estimates. The proposed technique can be alternatively considered as a mask estimation technique. The experimental results have shown the superiority of our proposal over MDT.
  • 15. Introduction Proposed recontruction technique Experimental results Conclusions Thank you!