SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Khmer ASR system
Sethserey SAM
sam.sethserey@itc.edu.kh
Part I: ASR in general
o    Definition
o    Type of ASR
o    ASR flow chart
o    Data requirement
o    Performance of ASR systems
o    Fundamental methods to create ASR system

                                            2
What is ASR system?
o  ASR: Automatic speech recognition
   system
o  ASR: A system or tool that can
   convert audio flow contained speech
   to text.
                               Seven
                               Seven days
                ASR System     Zaven
                                :
                               :

                              Text output

                                            3
ASR: what for?
o  ASR systems improve your life (works ,
   business, communication ,etc.)
Typology of ASR systems
o  Speaker-dependent vs. -independent

o  Language constraints:                   + Vocabulary:
  n    isolated word recognition
  n    connected word                        small (100),
  n    keyword spotting                      medium (5 000),
                                              large (50 000)
  n    continuous speech recognition


o  Robustness constraints
  n    laboratory (office) conditions: imposed
  n    microphone, channel noise …

                                                                5
Levels of complexity




                       6
ASR flow chart



                             s
                             e                        Seven
                             v                        Seven days
                                                      Zaven
                             e
                                                       :
                             n
                                                      :


     Signal processing           Decoding/Searching
       (digitalizing &
     feature extraction)
                           ASR system

                                                                   7
ASR data requirement
o  To train AM and ML models, huge amount of
   data (text & audio) are needed.

                         Pronunciation
         Audio +           dictionary
                                         Text data
    transcription data




                                                     8
ASR Performance
o    English ASR system Evaluations at National Institute of
     Standards and Technology (NIST)




                                                               9
Causes of ASR’s error rate
                         “seven”




o  The current ASR for continuous speech
   can not reach 0% of WER, why ?
  n  Acoustic model is affected by human character and
      environment: gender, age, emotion, pitch, accent,
      physical state, channel noise, etc.
  n  Lexical model is affected by incorrect word
      pronunciation.
  n  Language model : incorrect usage of words,
      grammar mistakes.
                                                      10
Three fundamental methods for
creating a new ASR system

o  Enough training data è bootstraping
o  Small amount of data è adaptation
o  No data è cross-language transfer




                                          11
Part II:
Khmer language & its processing
o  Khmer language
o  Why research on Khmer ASR?




                                12
Khmer Language
o    Official	
  language	
  of	
  Cambodia	
  
o    Spoken	
  by	
  more	
  than	
  15	
  M	
  people	
  
o    An	
  atonal	
  language	
  
o    Wri>ng	
  system	
  
     n  33	
  Consonants,	
  23	
  dependent	
  vowels	
  
     n  14	
  independent	
  vowels,	
  13	
  diacri>cs	
  and	
  various	
  signs	
  	
  	
  
     n  No	
  explicit	
  word	
  boundary	
  	
  
     	
  


                                                                                              13
Why research on Khmer ASR?
o  An	
  under-­‐resourced	
  language	
  	
  
    n  Lack	
  of	
  text	
  and	
  speech	
  data	
  in	
  digital	
  form	
  
    n  Lack	
  of	
  linguis>c	
  documents	
  (both	
  soK	
  and	
  hard	
  
        copies)	
  
o  Lacking	
  explicit	
  Word	
  Segmenta>on	
  	
  
    n  Automa>c	
  Word	
  Segmenta>on	
  is	
  needed	
  
    n  State-­‐of-­‐the-­‐art	
  method	
  of	
  	
  segmenta>on	
  uses	
  	
  
        –  hand-­‐craKed	
  lexicons,	
  word	
  frequencies,	
  	
  
        –  op>miza>on	
  criteria	
  …	
  
o  Others	
  under-­‐resourced,	
  unsegmented	
  
   languages	
  in	
  the	
  region	
  :	
  Burmese,	
  Laos,	
  Thai	
  
   Vietnamese	
  	
  	
  	
  
                                                                                    14
Part III:
    Khmer ASR at the glance
o  Corpus
  o  Speech corpus setup
  o  Text corpus setup
  o  General overview
o  Current ASR system
o  Future work


                              15
Corpus: Speeh corpus setup
o  Two types of corpus:
  n  small transcribed corpus (2007-2008)
     o  Transcribed manually by Engineering students at ITC
     o  only 6 hours of transcribed signal
     o  Nature: radio signal (poor quality) downloaded from
        radio australie, radio free asia and voice of america

  n  Large transcribed corpus (2011)
     o    Already have text and speech corresponding
     o    Students help verifying the transcription
     o    21 hours of transcribed signal
     o    Nature: reading speech from newspaper


                                                                16
Corpus: Text corpus setup
o  Retrieving	
  text	
  from	
  the	
  Web	
  is	
  becoming	
  a	
  common	
  approach	
  
o  Well	
  selected	
  rich-­‐content	
  websites	
  Vs	
  crawling	
  the	
  Web	
  
o  Adap>ng	
  ClipsTextTk,	
  an	
  open	
  source	
  tool	
  for	
  corpus	
  crea>on	
  for	
  
   Khmer	
  language	
  
      n    Conversion	
  from	
  legacy	
  character	
  encoding	
  to	
  Unicode	
  
      n    Automa>c	
  Segmenta>on	
  	
  
      n    Conversion	
  of	
  special	
  sign	
  and	
  number	
  to	
  text	
  
      n    Normaliza>on	
  of	
  word	
  spelling	
  
o  Text	
  Corpus	
  obtained	
  from	
  5	
  sites	
  :	
  
      n    2,5000	
  html	
  pages	
  retrieved	
  	
  
      n    AKer	
  processing	
  :	
  0.5	
  M	
  sentences,	
  15	
  M	
  words	
  
      n    Dura>on	
  :	
  November	
  2007	
  –	
  January	
  2008	
  	
  	
  

                                                                                              17
Corpus-Oveview
o  Description of Khmer ASR corpus
 Type               Small corpus         Large corpus
 Signal             ~6h of transcribed   ~20h of
 (acoustic model)   signal (radio)       transcribed
                                         signal (reading
                                         speech)
 Text                0,5 millions of     to be improved
 (language model)   phrase
                    ~ 15,5 millions of
                    words
 Pronunciation      ~ 20 000 words       To be improved
 Dictionary
 (lexical model)
                                                           18
Current ASR system
Continue ASR       Training &          Word Error Rate (%)
  System         tasting corpus
                                     Context       Context
                                    Dependent     Dependent
                                     (8gau)        (16gau)
Khmer ASR v1   - LM: 15.5M words      42.5           40.3
               - Training AM: 5h
               - Testing: 172p
Khmer ASR v2   - LM: 15M words        36.4            35
               - Training AM: 20h
               - Testing: 290 p




                                                             19
Future Work
o  Collect more text data for language
   model
o  Next challenge: How to improve
   Khmer ASR for independent speakers
   and in different environments?




                                     20
THANK YOU!!




              21

Mais conteúdo relacionado

Semelhante a Khmer ASR system overview and future directions

Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Basis Technology
 
Sltu12
Sltu12Sltu12
Sltu12tihtow
 
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"Lviv Startup Club
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overviewamr0mt
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...ijnlc
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...kevig
 
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...Guy De Pauw
 
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSTUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSkevig
 
Tuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural NetworksTuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural Networkskevig
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 

Semelhante a Khmer ASR system overview and future directions (20)

Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Sltu12
Sltu12Sltu12
Sltu12
 
Asr
AsrAsr
Asr
 
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Asr
AsrAsr
Asr
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
 
Sslis
SslisSslis
Sslis
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
 
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSTUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
 
Tuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural NetworksTuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural Networks
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Speech processing
Speech processingSpeech processing
Speech processing
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 

Mais de Bill Chea

Save time by using sass to develop css
Save time by using sass to develop cssSave time by using sass to develop css
Save time by using sass to develop cssBill Chea
 
Safety social media for positive social change
Safety social media for positive social changeSafety social media for positive social change
Safety social media for positive social changeBill Chea
 
Open street map
Open street mapOpen street map
Open street mapBill Chea
 
Open development cambodia
Open development cambodiaOpen development cambodia
Open development cambodiaBill Chea
 
Job hunting & career development
Job hunting & career developmentJob hunting & career development
Job hunting & career developmentBill Chea
 
Internet security
Internet securityInternet security
Internet securityBill Chea
 
How to build up communication skill
How to build up communication skillHow to build up communication skill
How to build up communication skillBill Chea
 
Google mapmaker
Google mapmakerGoogle mapmaker
Google mapmakerBill Chea
 
Financial job study travel planning
Financial job study travel planningFinancial job study travel planning
Financial job study travel planningBill Chea
 
ERP web based system
ERP web based systemERP web based system
ERP web based systemBill Chea
 
10 golden features of business website
10 golden features of business website10 golden features of business website
10 golden features of business websiteBill Chea
 
UrbanVoicePDF
UrbanVoicePDFUrbanVoicePDF
UrbanVoicePDFBill Chea
 
4 hour-workweek-blogger
4 hour-workweek-blogger4 hour-workweek-blogger
4 hour-workweek-bloggerBill Chea
 

Mais de Bill Chea (19)

Why ruby
Why rubyWhy ruby
Why ruby
 
Unix tc
Unix tcUnix tc
Unix tc
 
Sithi hub
Sithi hubSithi hub
Sithi hub
 
Save time by using sass to develop css
Save time by using sass to develop cssSave time by using sass to develop css
Save time by using sass to develop css
 
Safety social media for positive social change
Safety social media for positive social changeSafety social media for positive social change
Safety social media for positive social change
 
Open street map
Open street mapOpen street map
Open street map
 
Open development cambodia
Open development cambodiaOpen development cambodia
Open development cambodia
 
Less css
Less cssLess css
Less css
 
Job hunting & career development
Job hunting & career developmentJob hunting & career development
Job hunting & career development
 
Internet security
Internet securityInternet security
Internet security
 
How to build up communication skill
How to build up communication skillHow to build up communication skill
How to build up communication skill
 
Google mapmaker
Google mapmakerGoogle mapmaker
Google mapmaker
 
Financial job study travel planning
Financial job study travel planningFinancial job study travel planning
Financial job study travel planning
 
Khmer OCR
Khmer OCRKhmer OCR
Khmer OCR
 
ERP web based system
ERP web based systemERP web based system
ERP web based system
 
10 golden features of business website
10 golden features of business website10 golden features of business website
10 golden features of business website
 
UrbanVoicePDF
UrbanVoicePDFUrbanVoicePDF
UrbanVoicePDF
 
4 hour-workweek-blogger
4 hour-workweek-blogger4 hour-workweek-blogger
4 hour-workweek-blogger
 
UrbanVoice
UrbanVoiceUrbanVoice
UrbanVoice
 

Khmer ASR system overview and future directions

  • 1. Khmer ASR system Sethserey SAM sam.sethserey@itc.edu.kh
  • 2. Part I: ASR in general o  Definition o  Type of ASR o  ASR flow chart o  Data requirement o  Performance of ASR systems o  Fundamental methods to create ASR system 2
  • 3. What is ASR system? o  ASR: Automatic speech recognition system o  ASR: A system or tool that can convert audio flow contained speech to text. Seven Seven days ASR System Zaven : : Text output 3
  • 4. ASR: what for? o  ASR systems improve your life (works , business, communication ,etc.)
  • 5. Typology of ASR systems o  Speaker-dependent vs. -independent o  Language constraints: + Vocabulary: n  isolated word recognition n  connected word small (100), n  keyword spotting medium (5 000), large (50 000) n  continuous speech recognition o  Robustness constraints n  laboratory (office) conditions: imposed n  microphone, channel noise … 5
  • 7. ASR flow chart s e Seven v Seven days Zaven e : n : Signal processing Decoding/Searching (digitalizing & feature extraction) ASR system 7
  • 8. ASR data requirement o  To train AM and ML models, huge amount of data (text & audio) are needed. Pronunciation Audio + dictionary Text data transcription data 8
  • 9. ASR Performance o  English ASR system Evaluations at National Institute of Standards and Technology (NIST) 9
  • 10. Causes of ASR’s error rate “seven” o  The current ASR for continuous speech can not reach 0% of WER, why ? n  Acoustic model is affected by human character and environment: gender, age, emotion, pitch, accent, physical state, channel noise, etc. n  Lexical model is affected by incorrect word pronunciation. n  Language model : incorrect usage of words, grammar mistakes. 10
  • 11. Three fundamental methods for creating a new ASR system o  Enough training data è bootstraping o  Small amount of data è adaptation o  No data è cross-language transfer 11
  • 12. Part II: Khmer language & its processing o  Khmer language o  Why research on Khmer ASR? 12
  • 13. Khmer Language o  Official  language  of  Cambodia   o  Spoken  by  more  than  15  M  people   o  An  atonal  language   o  Wri>ng  system   n  33  Consonants,  23  dependent  vowels   n  14  independent  vowels,  13  diacri>cs  and  various  signs       n  No  explicit  word  boundary       13
  • 14. Why research on Khmer ASR? o  An  under-­‐resourced  language     n  Lack  of  text  and  speech  data  in  digital  form   n  Lack  of  linguis>c  documents  (both  soK  and  hard   copies)   o  Lacking  explicit  Word  Segmenta>on     n  Automa>c  Word  Segmenta>on  is  needed   n  State-­‐of-­‐the-­‐art  method  of    segmenta>on  uses     –  hand-­‐craKed  lexicons,  word  frequencies,     –  op>miza>on  criteria  …   o  Others  under-­‐resourced,  unsegmented   languages  in  the  region  :  Burmese,  Laos,  Thai   Vietnamese         14
  • 15. Part III: Khmer ASR at the glance o  Corpus o  Speech corpus setup o  Text corpus setup o  General overview o  Current ASR system o  Future work 15
  • 16. Corpus: Speeh corpus setup o  Two types of corpus: n  small transcribed corpus (2007-2008) o  Transcribed manually by Engineering students at ITC o  only 6 hours of transcribed signal o  Nature: radio signal (poor quality) downloaded from radio australie, radio free asia and voice of america n  Large transcribed corpus (2011) o  Already have text and speech corresponding o  Students help verifying the transcription o  21 hours of transcribed signal o  Nature: reading speech from newspaper 16
  • 17. Corpus: Text corpus setup o  Retrieving  text  from  the  Web  is  becoming  a  common  approach   o  Well  selected  rich-­‐content  websites  Vs  crawling  the  Web   o  Adap>ng  ClipsTextTk,  an  open  source  tool  for  corpus  crea>on  for   Khmer  language   n  Conversion  from  legacy  character  encoding  to  Unicode   n  Automa>c  Segmenta>on     n  Conversion  of  special  sign  and  number  to  text   n  Normaliza>on  of  word  spelling   o  Text  Corpus  obtained  from  5  sites  :   n  2,5000  html  pages  retrieved     n  AKer  processing  :  0.5  M  sentences,  15  M  words   n  Dura>on  :  November  2007  –  January  2008       17
  • 18. Corpus-Oveview o  Description of Khmer ASR corpus Type Small corpus Large corpus Signal ~6h of transcribed ~20h of (acoustic model) signal (radio) transcribed signal (reading speech) Text 0,5 millions of to be improved (language model) phrase ~ 15,5 millions of words Pronunciation ~ 20 000 words To be improved Dictionary (lexical model) 18
  • 19. Current ASR system Continue ASR Training & Word Error Rate (%) System tasting corpus Context Context Dependent Dependent (8gau) (16gau) Khmer ASR v1 - LM: 15.5M words 42.5 40.3 - Training AM: 5h - Testing: 172p Khmer ASR v2 - LM: 15M words 36.4 35 - Training AM: 20h - Testing: 290 p 19
  • 20. Future Work o  Collect more text data for language model o  Next challenge: How to improve Khmer ASR for independent speakers and in different environments? 20