1. Improving Speech Recognition Using Limited Accent Diverse
British English Training Data
With Acoustic Model and Data Selection
By Maryam Najafian
Supervisor Prof. Martin Russell
University of Birmingham, UK
4th October 2016
Email: m.najafian@utdallas.edu
3. Overview
• Problems: (1) Multi conditional data problem (2) Recognition of 14
regional accents of British English (3) Define an approach to measure the
accent difficulty
• Low dimensional visualisation of the AID feature space reveals expected
relationships between regional accents.
• One approach to accent robust ASR is adaption to the speakers accent
using an online AID to select an accent specific acoustic model 1,2,3
• Another approach to accent-robust ASR is AID and analyse the training
and apply data selection to train a DNN based system 1,2.
[1] M Najafian, Acoustic model selection for recognition of regional accented speech”,
Doctoral dissertation, Ph. D. dissertation, University of Birmingham, UK, 2016.
[2] M Najafian et al. Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems," in ODYSSEY, 2016, pp. 132-139.
[3] M Najafian et al. Unsupervised Model Selection for Recognition of Regional Accented “peech , Proc. Interspeech 2014.
[4] M Najafian et al. Improving speech recognition using limited accent diverse British English training data with deep neural networks in MLSP, 2016.
2/12
4. Objectives
• This research is concerned with automatic speech recognition (ASR) for accented
speech using a range of different AID systems for GMM-HMM and DNN-HMM
based acoustic model selection
• Trained on the SI training set (92 speakers, 7861 utterances) of the WSJCAM0
corpus of read British English speech
• Tested/adapted on ABI Corpus, 14 different accents (285 speakers)
3/12
5. Baseline AID System Design
Phonotactic
Accuracy : 80.65 %
I-vector
Accuracy :76.76%
ACCDIST-SVM
Accuracy: 95 %
4/12
11. DNN based ASR Vs
i-vector based AID
error rates
10/12
12. DNN-HMM: Extra Training Material (ETM) &
Extra Pre-Training Material (EPM)
The relationship between AI &ASR error rates motivated analysis of the effect of
supplementing the WSJCAM0 training set with different types of accented speech!
13. Summary and publications
• To address the multi-accent learning problem in a deep learning acoustic
modelling framework with limited resources, this work introduced a
concept called accent difficulty to analyse the training set
• A relative gain of 46.85% is achieved in recognising the Accents of British
Isles corpus by applying a baseline DNN model rather than a Gaussian
mixture model.
• Our results show that across all accent regions supplementing the
training set with a small amount of data from the most difficult accent
(2.25 hours of Glaswegian accent) leads to a similar gain in performance
as using a large amount of accent diverse data (8.96 hours from 14
accent regions), even though this accent accounts for just 14% of the test
data.
12/12
14. Thank you for listening
Email: m.najafian@utdallas.edu