- The document summarizes Khmer ASR systems, including defining ASR, describing the types and processes of ASR systems, and outlining the requirements for building an ASR system.
- It then provides details on Khmer language characteristics and the need for Khmer ASR research as an under-resourced language.
- Finally, it describes the Khmer speech and text corpora created, an overview of the current Khmer ASR system which achieved a word error rate of 35-40%, and plans to improve the system by collecting more text data and building speaker independence.
2. Part I: ASR in general
o Definition
o Type of ASR
o ASR flow chart
o Data requirement
o Performance of ASR systems
o Fundamental methods to create ASR system
2
3. What is ASR system?
o ASR: Automatic speech recognition
system
o ASR: A system or tool that can
convert audio flow contained speech
to text.
Seven
Seven days
ASR System Zaven
:
:
Text output
3
4. ASR: what for?
o ASR systems improve your life (works ,
business, communication ,etc.)
5. Typology of ASR systems
o Speaker-dependent vs. -independent
o Language constraints: + Vocabulary:
n isolated word recognition
n connected word small (100),
n keyword spotting medium (5 000),
large (50 000)
n continuous speech recognition
o Robustness constraints
n laboratory (office) conditions: imposed
n microphone, channel noise …
5
7. ASR flow chart
s
e Seven
v Seven days
Zaven
e
:
n
:
Signal processing Decoding/Searching
(digitalizing &
feature extraction)
ASR system
7
8. ASR data requirement
o To train AM and ML models, huge amount of
data (text & audio) are needed.
Pronunciation
Audio + dictionary
Text data
transcription data
8
9. ASR Performance
o English ASR system Evaluations at National Institute of
Standards and Technology (NIST)
9
10. Causes of ASR’s error rate
“seven”
o The current ASR for continuous speech
can not reach 0% of WER, why ?
n Acoustic model is affected by human character and
environment: gender, age, emotion, pitch, accent,
physical state, channel noise, etc.
n Lexical model is affected by incorrect word
pronunciation.
n Language model : incorrect usage of words,
grammar mistakes.
10
11. Three fundamental methods for
creating a new ASR system
o Enough training data è bootstraping
o Small amount of data è adaptation
o No data è cross-language transfer
11
13. Khmer Language
o Official
language
of
Cambodia
o Spoken
by
more
than
15
M
people
o An
atonal
language
o Wri>ng
system
n 33
Consonants,
23
dependent
vowels
n 14
independent
vowels,
13
diacri>cs
and
various
signs
n No
explicit
word
boundary
13
14. Why research on Khmer ASR?
o An
under-‐resourced
language
n Lack
of
text
and
speech
data
in
digital
form
n Lack
of
linguis>c
documents
(both
soK
and
hard
copies)
o Lacking
explicit
Word
Segmenta>on
n Automa>c
Word
Segmenta>on
is
needed
n State-‐of-‐the-‐art
method
of
segmenta>on
uses
– hand-‐craKed
lexicons,
word
frequencies,
– op>miza>on
criteria
…
o Others
under-‐resourced,
unsegmented
languages
in
the
region
:
Burmese,
Laos,
Thai
Vietnamese
14
15. Part III:
Khmer ASR at the glance
o Corpus
o Speech corpus setup
o Text corpus setup
o General overview
o Current ASR system
o Future work
15
16. Corpus: Speeh corpus setup
o Two types of corpus:
n small transcribed corpus (2007-2008)
o Transcribed manually by Engineering students at ITC
o only 6 hours of transcribed signal
o Nature: radio signal (poor quality) downloaded from
radio australie, radio free asia and voice of america
n Large transcribed corpus (2011)
o Already have text and speech corresponding
o Students help verifying the transcription
o 21 hours of transcribed signal
o Nature: reading speech from newspaper
16
17. Corpus: Text corpus setup
o Retrieving
text
from
the
Web
is
becoming
a
common
approach
o Well
selected
rich-‐content
websites
Vs
crawling
the
Web
o Adap>ng
ClipsTextTk,
an
open
source
tool
for
corpus
crea>on
for
Khmer
language
n Conversion
from
legacy
character
encoding
to
Unicode
n Automa>c
Segmenta>on
n Conversion
of
special
sign
and
number
to
text
n Normaliza>on
of
word
spelling
o Text
Corpus
obtained
from
5
sites
:
n 2,5000
html
pages
retrieved
n AKer
processing
:
0.5
M
sentences,
15
M
words
n Dura>on
:
November
2007
–
January
2008
17
18. Corpus-Oveview
o Description of Khmer ASR corpus
Type Small corpus Large corpus
Signal ~6h of transcribed ~20h of
(acoustic model) signal (radio) transcribed
signal (reading
speech)
Text 0,5 millions of to be improved
(language model) phrase
~ 15,5 millions of
words
Pronunciation ~ 20 000 words To be improved
Dictionary
(lexical model)
18
19. Current ASR system
Continue ASR Training & Word Error Rate (%)
System tasting corpus
Context Context
Dependent Dependent
(8gau) (16gau)
Khmer ASR v1 - LM: 15.5M words 42.5 40.3
- Training AM: 5h
- Testing: 172p
Khmer ASR v2 - LM: 15M words 36.4 35
- Training AM: 20h
- Testing: 290 p
19
20. Future Work
o Collect more text data for language
model
o Next challenge: How to improve
Khmer ASR for independent speakers
and in different environments?
20