Dsp2015for ss

•

1 like•14,172 views

This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.

Engineering

Statistical-Model-Based Speech Enhancement
with Musical-Noise-Free Properties
Hiroshi Saruwatari
(The University of Tokyo, JAPAN)
IEEE DSP2015 Invited Talk

Outline
1. Research background
2. What is musical-noise-free?
3. Conventional statistical-model-based
speech enhancement
4. Proposed method and analysis
5. Experimental evaluation
6. Conclusion
2

Research Background and Goal
 Single-channel speech enhancement
 Spectral subtraction (SS) [Boll, 1979], Wiener Filtering,
Bayesian minimum mean-square error short-time
spectral amplitude (MMSE-STSA) estimator [Ephraim,
1984], MAP estimator [Lotter, 2005], etc.
 Harmful distortion owing to musical noise generation
 Musical-noise-free speech enhancement
[Miyazaki, Saruwatari et al., IEEE Trans. ASLP 2012]
 Noise reduction without any musical noise
 We have found that SS (maximum-likelihood amplitude
estimator) has musical-noise-free state.
 Whether or not Generalized Bayesian MMSE-STSA
estimator has musical-noise-free state?
3

Relation between Musical Noise and Kurtosis
4
Proportional relation
between human perception
(musical noise score) and
log kurtosis ratio
[Saruwatari, 2008]

Musical-Noise-Free Speech Enhancement
 Iterative noise reduction procedure with musical-noise-
free condition [Miyazaki, Saruwatari, et al., IEEE Trans. ASLP 2012]
6
…

MOSIE (generalized MMSE-STSA) Estimator
7
Statistical speech amplitude estimator with parametric
speech prior [Breithaupt, et al., IEEE Trans. 2011]

How to Generate Musical-Noise-Free State?
8
Unfortunately we cannot find any
musical-noise-free states in the
conventional MOSIE estimator.
No intersection!
Forgetting factor a
is increasing

Calculation of Moment for Biased MOSIE (1/4)
10
1. Derivation of p.d.f.

Calculation of Moment for Biased MOSIE (2/4)
11
2. Calculation of moment for

Calculation of Moment for Biased MOSIE (3/4)
12
3. Moment-cumulant transformation for
4. Cumulant of noise power spectrum

Calculation of Moment for Biased MOSIE (4/4)
13
5. Cumulant-moment transformation for
m1 is used for NRR, and m2 and m4 are used for kurtosis,
which are functions of value of bias e.

Calculation of Moment for Biased MOSIE (4/4)
14
Bias e large

Experiment 1: Existence of Musical-Noise-Free
15
Noise White Gaussian noise in 0-dB SNR
Speech prior Gaussian model (r = 1)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Theoretical analysis Experimental results
Bias e = 0
To introduce bias ε, we find musical-noise-free state in
statistical-model-based estimator.
e large

Experiment 2: Existence of Musical-Noise-Free
16
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Theoretical analysis Experimental results
Bias e = 0
Strong speech prior (small ρ) gives almost no musical-
noise-free state in real processing.
e large

Experiment 3: Comparison with Other Methods
17
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB

Experiment 3: Comparison with Other Methods
18
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
Large musical
noise methods
No musical noise methods

Experiment 3: Comparison with Other Methods
19
Speech 10 utterances
Noise White Gaussian noise in 0-dB SNR
Speech prior Super Gaussian model (r = 0.5, b = 0.001)
Forgetting factor in DD 0.98
Noise PSD estimation Minimum Statistics Method [Martin, 1994]
Target NRR 16 dB
Lowest
speech
distortion
Large musical
noise methods
No musical noise methods
Richer speech prior

Conclusion
 To introduce bias ε, we find musical-noise-free
state in Bayesian estimator.
 Proposed biased MOSIE estimator can achieve
better cepstral distortion whereas its kurtosis ratio
is perfectly fixed to 1.0.
 Strong speech prior (small ρ) gives almost no
musical-noise-free state. So we should carefully
select the appropriate prior to maintain the qualities
of both speech and remaining noise.
20
Thank you for your attention!

What's hot

音源分離における音響モデリング（Acoustic modeling in audio source separation）Daichi Kitamura

調波打撃音モデルに基づく線形多チャネルブラインド音源分離Kitamura Laboratory

Moment matching networkを用いた音声パラメータのランダム生成の検討Shinnosuke Takamichi

調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離Kitamura Laboratory

イベント継続長を明示的に制御したBLSTM-HSMMハイブリッドモデルによる多重音響イベント検出Tomoki Hayashi

音声の声質を変換する技術とその応用NU_I_TODALAB

非負値行列因子分解を用いた被り音の抑圧Kitamura Laboratory

ケプストラム正則化NTFによるステレオチャネル楽曲音源分離NU_I_TODALAB

DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka

スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価Daichi Kitamura

楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法NU_I_TODALAB

距離学習を導入した二値分類モデルによる異常音検知NU_I_TODALAB

深層学習を利用した音声強調Yuma Koizumi

非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura

深層学習を用いた音源定位、音源分離、クラス分類の統合～環境音セグメンテーション手法の紹介～Yui Sudo

音情報処理における特徴表現NU_I_TODALAB

半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法Daichi Kitamura

日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”Shinnosuke Takamichi

深層生成モデルに基づく音声合成技術NU_I_TODALAB

リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法Shinnosuke Takamichi

What's hot (20)

音源分離における音響モデリング（Acoustic modeling in audio source separation）

調波打撃音モデルに基づく線形多チャネルブラインド音源分離

Moment matching networkを用いた音声パラメータのランダム生成の検討

調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離

イベント継続長を明示的に制御したBLSTM-HSMMハイブリッドモデルによる多重音響イベント検出

音声の声質を変換する技術とその応用

非負値行列因子分解を用いた被り音の抑圧

ケプストラム正則化NTFによるステレオチャネル楽曲音源分離

DNN音響モデルにおける特徴量抽出の諸相

スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価

楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法

距離学習を導入した二値分類モデルによる異常音検知

深層学習を利用した音声強調

非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...

深層学習を用いた音源定位、音源分離、クラス分類の統合～環境音セグメンテーション手法の紹介～

音情報処理における特徴表現

半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法

日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”

深層生成モデルに基づく音声合成技術

リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法

Viewers also liked

Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo

Asj2017 3 bileveloptnmfSaruwatariLabUTokyo

Hybrid NMF APSIPA2014 invitedSaruwatariLabUTokyo

Ea2015 7for ssSaruwatariLabUTokyo

ILRMA 20170227 danwakaiSaruwatariLabUTokyo

Ica2016 312 saruwatariSaruwatariLabUTokyo

Apsipa2016for ssSaruwatariLabUTokyo

Koyama AES Conference SFC 2016SaruwatariLabUTokyo

Discriminative SNMF EA201603SaruwatariLabUTokyo

数値解析と物理学すずしめ

独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...Daichi Kitamura

Viewers also liked (11)

Koyama ASA ASJ joint meeting 2016

Asj2017 3 bileveloptnmf

Hybrid NMF APSIPA2014 invited

Ea2015 7for ss

ILRMA 20170227 danwakai

Ica2016 312 saruwatari

Apsipa2016for ss

Koyama AES Conference SFC 2016

Discriminative SNMF EA201603

数値解析と物理学

独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...

Similar to Dsp2015for ss

Une18apsipaYuki Saito

International Journal of Computational Engineering Research(IJCER)ijceronline

Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...Hiroki_Tanji

ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISESsipij

F010334548IOSR Journals

A new methodology for sp noise removal in digital image processing ijfcstjournal

Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based F...Juan Camilo Vasquez

A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...a3labdsp

A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij

Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe

Audio Noise Removal – The State of the Artijceronline

The past, present and future of singing synthesisEji Warp

Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals

A fast and effective impulse noise filterIJRES Journal

The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio EstimatorIJERA Editor

20150211 NAB paper - Audio Loudness Range -John KeanJeremy Adams

Analysis PSNR of High Density Salt and Pepper Impulse Noise Using Median Filterijtsrd

A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals

Similar to Dsp2015for ss (20)

Une18apsipa

International Journal of Computational Engineering Research(IJCER)

Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...

ANALYSIS OF MMSE SPEECH ESTIMATION IMPACT IN WEST SUMATRA'S NOISES

F010334548

A new methodology for sp noise removal in digital image processing

Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based F...

A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...

A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...

Adaptive noise estimation algorithm for speech enhancement

Audio Noise Removal – The State of the Art

The past, present and future of singing synthesis

Improvement of minimum tracking in Minimum Statistics noise estimation method

A fast and effective impulse noise filter

The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator

20150211 NAB paper - Audio Loudness Range -John Kean

Analysis PSNR of High Density Salt and Pepper Impulse Noise Using Median Filter

A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Double rodded leveling 1 pdf activity 01KreezheaRecto

data_management_and _data_science_cheat_sheet.pdfJiananWang21

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Vivazz, Mieres Social Housing Design Spaintimesproduction05

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

Thermal Engineering-R & A / C - unit - VDineshKumar4165

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

Java Programming :Event Handling(Types of Events)simmis5

AKTU Computer Networks notes --- Unit 3.pdfankushspencer015

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Double rodded leveling 1 pdf activity 01

data_management_and _data_science_cheat_sheet.pdf

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

Vivazz, Mieres Social Housing Design Spain

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking

Thermal Engineering-R & A / C - unit - V

KubeKraft presentation @CloudNativeHooghly

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking

Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...

Java Programming :Event Handling(Types of Events)

AKTU Computer Networks notes --- Unit 3.pdf

Dsp2015for ss

1. Statistical-Model-Based Speech Enhancement with Musical-Noise-Free Properties Hiroshi Saruwatari (The University of Tokyo, JAPAN) IEEE DSP2015 Invited Talk

2. Outline 1. Research background 2. What is musical-noise-free? 3. Conventional statistical-model-based speech enhancement 4. Proposed method and analysis 5. Experimental evaluation 6. Conclusion 2

3. Research Background and Goal  Single-channel speech enhancement  Spectral subtraction (SS) [Boll, 1979], Wiener Filtering, Bayesian minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator [Ephraim, 1984], MAP estimator [Lotter, 2005], etc.  Harmful distortion owing to musical noise generation  Musical-noise-free speech enhancement [Miyazaki, Saruwatari et al., IEEE Trans. ASLP 2012]  Noise reduction without any musical noise  We have found that SS (maximum-likelihood amplitude estimator) has musical-noise-free state.  Whether or not Generalized Bayesian MMSE-STSA estimator has musical-noise-free state? 3

4. Relation between Musical Noise and Kurtosis 4 Proportional relation between human perception (musical noise score) and log kurtosis ratio [Saruwatari, 2008]

5. What is Musical-Noise-Free? 5

6. Musical-Noise-Free Speech Enhancement  Iterative noise reduction procedure with musical-noise- free condition [Miyazaki, Saruwatari, et al., IEEE Trans. ASLP 2012] 6 …

7. MOSIE (generalized MMSE-STSA) Estimator 7 Statistical speech amplitude estimator with parametric speech prior [Breithaupt, et al., IEEE Trans. 2011]

8. How to Generate Musical-Noise-Free State? 8 Unfortunately we cannot find any musical-noise-free states in the conventional MOSIE estimator. No intersection! Forgetting factor a is increasing

9. Analysis Strategy 9

10. Calculation of Moment for Biased MOSIE (1/4) 10 1. Derivation of p.d.f.

11. Calculation of Moment for Biased MOSIE (2/4) 11 2. Calculation of moment for

12. Calculation of Moment for Biased MOSIE (3/4) 12 3. Moment-cumulant transformation for 4. Cumulant of noise power spectrum

13. Calculation of Moment for Biased MOSIE (4/4) 13 5. Cumulant-moment transformation for m1 is used for NRR, and m2 and m4 are used for kurtosis, which are functions of value of bias e.

14. Calculation of Moment for Biased MOSIE (4/4) 14 Bias e large

15. Experiment 1: Existence of Musical-Noise-Free 15 Noise White Gaussian noise in 0-dB SNR Speech prior Gaussian model (r = 1) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Theoretical analysis Experimental results Bias e = 0 To introduce bias ε, we find musical-noise-free state in statistical-model-based estimator. e large

16. Experiment 2: Existence of Musical-Noise-Free 16 Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Theoretical analysis Experimental results Bias e = 0 Strong speech prior (small ρ) gives almost no musical- noise-free state in real processing. e large

17. Experiment 3: Comparison with Other Methods 17 Speech 10 utterances Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5, b = 0.001) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Target NRR 16 dB

18. Experiment 3: Comparison with Other Methods 18 Speech 10 utterances Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5, b = 0.001) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Target NRR 16 dB Large musical noise methods No musical noise methods

19. Experiment 3: Comparison with Other Methods 19 Speech 10 utterances Noise White Gaussian noise in 0-dB SNR Speech prior Super Gaussian model (r = 0.5, b = 0.001) Forgetting factor in DD 0.98 Noise PSD estimation Minimum Statistics Method [Martin, 1994] Target NRR 16 dB Lowest speech distortion Large musical noise methods No musical noise methods Richer speech prior

20. Conclusion  To introduce bias ε, we find musical-noise-free state in Bayesian estimator.  Proposed biased MOSIE estimator can achieve better cepstral distortion whereas its kurtosis ratio is perfectly fixed to 1.0.  Strong speech prior (small ρ) gives almost no musical-noise-free state. So we should carefully select the appropriate prior to maintain the qualities of both speech and remaining noise. 20 Thank you for your attention!

Dsp2015for ss

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Dsp2015for ss

Similar to Dsp2015for ss (20)

Recently uploaded

Recently uploaded (20)

Dsp2015for ss