SlideShare uma empresa Scribd logo
1 de 26
24 Feb 2014
Takuya Yoshioka
NTT CS Labs, Cambridge University
Thanks to: T. Nakatani, K. Kinoshita, M. Delcrolix (NTT)
M. Gales, X. Chen (Cambridge)
Speech Enhancement for ASR
• Effectiveness measured by WER
– use of a sensible ASR system essential
• Huge computational resources available
• Offline processing allowed
• AM can also do some job
Typical ASR System
Pron
Dict
LMAM
Recog
Engine
Speech
Enh
Front-
End
Signal Sentence
Different Approaches for Different Situations
• 1ch vs. Mch (M > 1)
• background noise;
• reverberant noise; or
• interfering talkers
Different Approaches for Different Situations
• 1ch vs. Mch (M > 1)
• background noise;
• reverberant noise; or
• interfering talkers
• Reverberation usually modelled with FIR
• Given (x[t])t=1,…,N, recover (s[t])t=1,…,N
1ch Dereverberation (Offline)
∑=
−=
T
tshtx
0
][][][
τ
ττ
Approaches
• Time domain
– subspace, Trinicon, Long-term LP
– accuate
– can account for phase distortion
• Power spectral domain
– WF, NMF
– robust against speaker movement
• Feature domain
– front-end VTS, direct CMLLR
– can leverage the AM
Dereverb
Dereverb
Analysis
Synthesis
xk(t) sk(t)
x[t] s[t]
∑=
∗
−=
T
kkk tshtx
0
)()()(
τ
ττ
...
Assume in each sub-band
Inverse Filtering (in Each Sub-band)
∑=
∗
−=
U
kkk txgts
0
)()()(
τ
ττ
Long-Term Linear Prediction
)()()()( tetxatx k
U
kkk +−= ∑∆=
∗
τ
ττ
)(tsk
∑∆=
∗
−−=
U
kkkk txatxts
τ
ττ )()()()(
we don’t minimise ek(t)!
Why LP?
)()()()( tstxatx k
U
kkk +−= ∑∆=
∗
τ
ττ ∑=
∗
−=
T
kkk tshtx
0
)()()(
τ
ττ
LP vs. FIR
( )tk
U
kkUtkk tyaNtyty ,,...,1' ,)()(~))'((|)( λτττ∑ ∆=
∗
= −
( )∑ ∑=
∆=
∗
= −=
N
t
tk
U
kkNtk tyaftyp
1
,Normal,...,1 ,)()(log))((log λτττ
+
),0(~)( ,tkk Nts λ )()()()( tstxatx k
U
kkk +−= ∑∆=
∗
τ
ττ
Interleaved Estimation of:
- LP coeff A= (ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T
- clean speech samples
Initialise A
Calculate sk(t)
Estimate LP coeffs A
Convergent?
Estimate speech vars Λ
Eval on REVERB Challenge Data Sets
System %WER
DNN AM + RNN LM + AM adapt 20.0
Dereverb + DNN AM + RNN LM + AM adapt 16.5
• prompts from 5K WSJ
• trained on multi-condition data
• tested on real recordings from dev set
• small amount of background noise
Eval on AMI Corpus (Meeting Transcription)
System
%WER
Dev Eval
DNN AM + 3gram LM 43.5 42.6
Dereverb + DNN AM + 3gram LM 42.0 41.1
• 4 participants in each meeting
• table-top microphone used
• single-speaker segments used
• severe reverberation and background noise
1ch Algorithm Summary
• very robust against modelling errors
• keys in development
– modelling the reverberation with LP
– using a reasonable clean speech pdf
Multi-Channel Extension
Dereverb BF To recogniser
• LP  MIMO LP
)()()()( ttt k
U
kkk exΑx +−= ∑∆=
∗
τ
ττ
)(tskh
• LP  MIMO LP
• single speech model  vector speech model
)()()()( ttt k
U
kkk exΑx +−= ∑∆=
∗
τ
ττ
)(tskh
),0(~)( ,tkk Nts λ ),0(~)( ,tkk Nts λ∗
hhh
),0( ,tkN λI≈
⇔
Interleaved Estimation of:
- LP matrix A= (Ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T
- clean speech samples
Initialise A
Calculate sk(t)
Estimate LP matrices A
Convergent?
Estimate speech vars Λ
Eval on REVERB Challenge Data Sets
#Mics System %WER
1
Baseline(DNN AM + RNN LM + AM adapt) 20.0
Dereverb + Baseline 16.5
2
Dereverb + Baseline 14.8
Dereverb + MVDR + Baseline 13.6
8
Dereverb + Baseline 14.0
Dereverb + MVDR + Baseline 11.3
Long-Term LP Summary
• very robust against modelling errors
• can cover both 1ch and Mch set-ups
• keys in development
– modelling the reverberation with LP
– using a reasonable clean speech pdf
Extensions Explored
• dereverberation+BSS
• adaptive long-term LP
• NMF-based dereverberation
– works in the power spectrum domain
• FE-VTS dereverberation
Dereverberation+BSS
Dereverb BSS
T60=0.3 s T60=0.5 s
0
2
4
6
8
10
12
14
16
dereverberation+separation
separation
w/oseparation
SIR(dB)
Conclusion
• Dereverberation based on long-term LP
– represents reverberation with LP
– consistent framework covering both 1ch and
Mch set-ups
– provides gains over well-optimised DNN AMs
in realistic conditions
– extensions to several directions described

Mais conteúdo relacionado

Mais procurados

Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with Notes
Leon Nguyen
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
niranjan kumar
 

Mais procurados (20)

Overview of sampling
Overview of samplingOverview of sampling
Overview of sampling
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with Notes
 
Multrate dsp
Multrate dspMultrate dsp
Multrate dsp
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Non-Uniform sampling and reconstruction of multi-band signals
Non-Uniform sampling and reconstruction of multi-band signalsNon-Uniform sampling and reconstruction of multi-band signals
Non-Uniform sampling and reconstruction of multi-band signals
 
Multirate dtsp
Multirate dtspMultirate dtsp
Multirate dtsp
 
1 AUDIO SIGNAL PROCESSING
1 AUDIO SIGNAL PROCESSING1 AUDIO SIGNAL PROCESSING
1 AUDIO SIGNAL PROCESSING
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
 
Fft analysis
Fft analysisFft analysis
Fft analysis
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
 
Lecture9
Lecture9Lecture9
Lecture9
 
The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filters
 
Signal Processing
Signal ProcessingSignal Processing
Signal Processing
 
Aliasing and Antialiasing filter
Aliasing and Antialiasing filterAliasing and Antialiasing filter
Aliasing and Antialiasing filter
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 

Destaque

Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech SignalsComparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
Deha Deniz Türköz
 
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
guestfb80e22
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013
Ojaswa Anand
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...
eSAT Journals
 
Active noise control
Active noise controlActive noise control
Active noise control
Rishikesh .
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
Harshal Ladhe
 
Honda presentation
Honda presentationHonda presentation
Honda presentation
RahulSN
 

Destaque (13)

Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech SignalsComparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...
 
Active noise control
Active noise controlActive noise control
Active noise control
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Final ppt
Final pptFinal ppt
Final ppt
 
Antinoise system & Noise Cancellation
Antinoise system & Noise CancellationAntinoise system & Noise Cancellation
Antinoise system & Noise Cancellation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Honda presentation
Honda presentationHonda presentation
Honda presentation
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 

Semelhante a Speech enhancement for distant talking speech recognition

Digital communication
Digital communicationDigital communication
Digital communication
meashi
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
HamzaJaved306957
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
chintanajoshi
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
NAVER Engineering
 
Chapter 6m
Chapter 6mChapter 6m
Chapter 6m
wafaa_A7
 

Semelhante a Speech enhancement for distant talking speech recognition (20)

Tdm fdm
Tdm fdmTdm fdm
Tdm fdm
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
 
Digital communication
Digital communicationDigital communication
Digital communication
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
PS
PSPS
PS
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
 
Techfest jan17
Techfest jan17Techfest jan17
Techfest jan17
 
Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdf
 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdf
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
auditory model
auditory modelauditory model
auditory model
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
 
unit 11.ppt
unit 11.pptunit 11.ppt
unit 11.ppt
 
1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
 
Chapter 6m
Chapter 6mChapter 6m
Chapter 6m
 
Radio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksRadio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural Networks
 
2015 12-10 chabert
2015 12-10 chabert2015 12-10 chabert
2015 12-10 chabert
 

Último

Último (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Speech enhancement for distant talking speech recognition

  • 1. 24 Feb 2014 Takuya Yoshioka NTT CS Labs, Cambridge University Thanks to: T. Nakatani, K. Kinoshita, M. Delcrolix (NTT) M. Gales, X. Chen (Cambridge)
  • 2. Speech Enhancement for ASR • Effectiveness measured by WER – use of a sensible ASR system essential • Huge computational resources available • Offline processing allowed • AM can also do some job
  • 4. Different Approaches for Different Situations • 1ch vs. Mch (M > 1) • background noise; • reverberant noise; or • interfering talkers
  • 5. Different Approaches for Different Situations • 1ch vs. Mch (M > 1) • background noise; • reverberant noise; or • interfering talkers
  • 6. • Reverberation usually modelled with FIR • Given (x[t])t=1,…,N, recover (s[t])t=1,…,N 1ch Dereverberation (Offline) ∑= −= T tshtx 0 ][][][ τ ττ
  • 7. Approaches • Time domain – subspace, Trinicon, Long-term LP – accuate – can account for phase distortion • Power spectral domain – WF, NMF – robust against speaker movement • Feature domain – front-end VTS, direct CMLLR – can leverage the AM
  • 8. Dereverb Dereverb Analysis Synthesis xk(t) sk(t) x[t] s[t] ∑= ∗ −= T kkk tshtx 0 )()()( τ ττ ... Assume in each sub-band
  • 9. Inverse Filtering (in Each Sub-band) ∑= ∗ −= U kkk txgts 0 )()()( τ ττ
  • 10. Long-Term Linear Prediction )()()()( tetxatx k U kkk +−= ∑∆= ∗ τ ττ )(tsk ∑∆= ∗ −−= U kkkk txatxts τ ττ )()()()( we don’t minimise ek(t)!
  • 11. Why LP? )()()()( tstxatx k U kkk +−= ∑∆= ∗ τ ττ ∑= ∗ −= T kkk tshtx 0 )()()( τ ττ LP vs. FIR
  • 12. ( )tk U kkUtkk tyaNtyty ,,...,1' ,)()(~))'((|)( λτττ∑ ∆= ∗ = − ( )∑ ∑= ∆= ∗ = −= N t tk U kkNtk tyaftyp 1 ,Normal,...,1 ,)()(log))((log λτττ + ),0(~)( ,tkk Nts λ )()()()( tstxatx k U kkk +−= ∑∆= ∗ τ ττ
  • 13. Interleaved Estimation of: - LP coeff A= (ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T - clean speech samples Initialise A Calculate sk(t) Estimate LP coeffs A Convergent? Estimate speech vars Λ
  • 14. Eval on REVERB Challenge Data Sets System %WER DNN AM + RNN LM + AM adapt 20.0 Dereverb + DNN AM + RNN LM + AM adapt 16.5 • prompts from 5K WSJ • trained on multi-condition data • tested on real recordings from dev set • small amount of background noise
  • 15. Eval on AMI Corpus (Meeting Transcription) System %WER Dev Eval DNN AM + 3gram LM 43.5 42.6 Dereverb + DNN AM + 3gram LM 42.0 41.1 • 4 participants in each meeting • table-top microphone used • single-speaker segments used • severe reverberation and background noise
  • 16. 1ch Algorithm Summary • very robust against modelling errors • keys in development – modelling the reverberation with LP – using a reasonable clean speech pdf
  • 18. • LP  MIMO LP )()()()( ttt k U kkk exΑx +−= ∑∆= ∗ τ ττ )(tskh
  • 19. • LP  MIMO LP • single speech model  vector speech model )()()()( ttt k U kkk exΑx +−= ∑∆= ∗ τ ττ )(tskh ),0(~)( ,tkk Nts λ ),0(~)( ,tkk Nts λ∗ hhh ),0( ,tkN λI≈ ⇔
  • 20. Interleaved Estimation of: - LP matrix A= (Ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T - clean speech samples Initialise A Calculate sk(t) Estimate LP matrices A Convergent? Estimate speech vars Λ
  • 21. Eval on REVERB Challenge Data Sets #Mics System %WER 1 Baseline(DNN AM + RNN LM + AM adapt) 20.0 Dereverb + Baseline 16.5 2 Dereverb + Baseline 14.8 Dereverb + MVDR + Baseline 13.6 8 Dereverb + Baseline 14.0 Dereverb + MVDR + Baseline 11.3
  • 22. Long-Term LP Summary • very robust against modelling errors • can cover both 1ch and Mch set-ups • keys in development – modelling the reverberation with LP – using a reasonable clean speech pdf
  • 23. Extensions Explored • dereverberation+BSS • adaptive long-term LP • NMF-based dereverberation – works in the power spectrum domain • FE-VTS dereverberation
  • 25. T60=0.3 s T60=0.5 s 0 2 4 6 8 10 12 14 16 dereverberation+separation separation w/oseparation SIR(dB)
  • 26. Conclusion • Dereverberation based on long-term LP – represents reverberation with LP – consistent framework covering both 1ch and Mch set-ups – provides gains over well-optimised DNN AMs in realistic conditions – extensions to several directions described