O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Kaldi-voice: Your personal speech recognition server using open source code

Presentation given at the Lisbon open data meeting on 8/2/2016

  • Seja o primeiro a comentar

Kaldi-voice: Your personal speech recognition server using open source code

  1. 1. Kaldi&voice+ Your+personal+speech+recogni4on+ server+using+open+source+code+ Xavier+Anguera+ CTO+&+CSO,+ELSA+Corp.+ xavier@elsanow.io+
  2. 2. Outline+ •  Intro+ •  What+is+speech+recogni4on+ –  Applica4ons+ •  Approaches+to+ASR+ –  PaHern+matching+approaches+ –  Sta4s4cal&based+approaches+ •  Available+speech+recogni4on+engines+ –  “open”+source+ –  Online+commercial+systems+ •  Building+your+own+online+system+ –  Live+demo+
  3. 3. Automa4c+Speech+Recogni4on+ •  Automa'c)Speech)Recogni'on)(ASR))is+the+ process+of+conver4ng+an+unknown+speech+ waveform+into+the+corresponding+orthographic+ transcrip4on.++ Image:+hHp://blogs.msdn.com/b/devschool/archive/2012/02/06/speech&recogni4on&using&visual&studio&determining&the&bna.aspx+
  4. 4. Content2 Personal22 context2 Search+ Summary+ Transcripts+ Meaning+Age+ Gender+ Height+ Spoken+language+ Spoken+dialect+ Spoken+accent+ Literacy+level+ Speaker+ID+ Personality+traits+(OCEAN)+ Speech+likability+ Speech+intelligibility+ Sleepiness/4redness+ Intoxica4on+level+ Emo4on+ State+of+interest+ Image:+Telefonica+I+D+
  5. 5. Applica4ons+of+Speech+Recogni4on/Understanding+(ASR/ASU)+ !  Dicta4on+ !  Telephone&based+Informa4on++ !  direc4ons,+air+travel,+banking,+etc+ !  Polls,+online+shopping+ !  Call+rou4ng+ !  Hands&free+ !  in+car,+computer,+home(domo4cs),+controlling+tools+ !  Second+language+(accent+reduc4on)+ !  Audio+archive+searching+ !  Help+for+disabled+people+
  6. 6. How+do+humans+do+it?+ Ar4cula4on+system+of+one+ person+produces+sound+waves+ which+the+ear+of+another+person+ conveys+to+the+brain+for+ processing+
  7. 7. How+can+computers+do+it?+ •  Digi4za4on+ •  Acous4c+analysis+of+the+ speech+signal+ •  Linguis4c+interpreta4on+ Acous4c+waveform+ Acous4c+signal+ Speech+recogni4on+
  8. 8. Challenges+in+ASR+processing+ !  Speaker+variability+ !  Inter&speaker:+Vocal+tract,+gender,+dialects+ !  Intra&speaker:+:+stress,+age,+humor,+changes+of+ar4cula4on+due+to+ environment+influence,+…+ !  Language+variability+ !  From+isolated+words+to+con4nuous+speech+ !  Out&of&vocabulary+words+ !  Vocabulary+size+and+domain+ !  From+just+a+few+words+(e.g.+Isolated+numbers)+to+large+vocabulary+speech+ recogni4on+ !  Domain+that+is+being+recognized+(medical,+social,+engineering,+…)+ !  Noise+ !  Convolu4ve:+recording/transmission+condi4ons,+reverbera4on+ !  Addi4ve:+recording+environment,+transmission+SNR+
  9. 9. Approaches+to+ASR+ !  PaHern&based+approaches+ !  Sta4s4cs&based+approaches+
  10. 10. PaHern&based+speech+recogni4on+ " Feature measurement: Filter Bank, MFCC, LPC, DFT, ... " Pattern training: Creation of a reference pattern derived from an averaging technique " Pattern classification: Compare speech patterns with a local distance measure and a global time alignment procedure (DTW) " Decision logic: similarity scores are used to decide which is the best reference pattern.
  11. 11. Template+Matching+Mechanism+
  12. 12. TDP:++Speech+Recogni4on+ Alignment+Example+
  13. 13. Sta4s4cs&based+approaches+ •  Can+be+seen+as+extension+of+template&based+approach,+ using+more+powerful+mathema4cal+and+sta4s4cal+tools+ •  Some4mes+seen+as+ an4&linguis4c +approach+ –  Fred+Jelinek+(IBM,+1988):+ Every+4me+I+fire+a+linguist+my+ system+improves •  Process:+ 1.  Collect+a+large2corpus+of+transcribed+speech+recordings+ 2.  Train+the+computer+to+learn+the+correspondences+ ( machine+learning )+ 3.  At+run+4me,+apply+sta4s4cal+processes+to+search+through+ the+space+of+all+possible+solu4ons,+and+pick+the+ sta4s4cally+most+likely+one+
  14. 14. Sta4s4cs&based+approaches+ •  Hidden+Markov+Models+(HMM)+ •  Gaussian+Mixture+Models+(GMM)+ •  Deep+Neural+Networks+(DNN)+
  15. 15. Markov+model+ Output2=2sequence2of2states2 Image:+hHp://madhukaudantha.blogspot.pt/2014/05/markov&models&and&hidden&markov&models.html+
  16. 16. Hidden+Markov+Models+(HMM)+ Output2=2observa:ons2linked2to2the2states2through2a2predefined2 probability2distribu:on2!2modeled2using2GMM2or2DNN2models2 Image:+hHp://izanami.tl.fukuoka&u.ac.jp/SLPL/HMM/HTKBook/node5.html+
  17. 17. 19/34+ HMMs+for+some+words+
  18. 18. Gaussian+Mixture+Models+(GMM)+ 1D+GMM+ 2D+GMM+
  19. 19. Dep+neural+networks+ Image:+hHp://www.amax.com/blog/+
  20. 20. A2neuron2in2our2brain2 Image:+hHp://www.medicalsciencenavigator.com/how&to&study&for&anatomy&and&physiology/why&sleep&improves&memory+
  21. 21. Classical+representa4on+of+a+neuron++ Long+short&term+memory+cells++
  22. 22. DNN+evolu4on+ •  We+started+to+use+mul4layer+perceptrons+ (MLP’s)+about+25+years+ago+[1]+ – Neural+networks+with+1+or+few+hidden+layers+ •  Around+2010+G.+Hinton+and+S.+Bengio+ (separately)+proposed+methods+to+effec4vely+ train+many+hidden+layers+ – Machines+have+become+much+more+powerful+ – Lots+of+audio+data+with+transcrip4ons+areavailable++ [1]+“Merging+Mul4layer+perceptrons+and+Hidden+Markov+Models:+some+experiments+in+con4nuous+ speech+recogni4on”,+Herve+Bourlard+and+Nelson+Morgan,+Technical+report+ICSI,+1989+
  23. 23. Image:+hHp://whatsnext.nuance.com/category/in&the&labs/+ Processing+power+evolu4on+
  24. 24. Image:+hHp://whatsnext.nuance.com/category/in&the&labs/+ ASR+performance+evolu4on+
  25. 25. Speech+recogni4on+engines+ •  HTK+(hHp://htk.eng.cam.ac.uk/),+non& commercial+license+ •  Sphinx+(hHp://cmusphinx.sourceforge.net/),+ GPL+ •  Julius+(hHp://julius.osdn.jp/en_index.php),+ open+ •  Kaldi+(hHp://www.kaldi&asr.org/),+Apache+ license+
  26. 26. Online+ASR&STT+services+ •  Google+voice+( hHps://console.developers.google.com/ project)+ •  ATT+voice+recogni4on+( hHp://developer.aH.com/apis/speech)+ •  Wit.ai+(hHps://wit.ai/)+
  27. 27. Building+an+ASR+with+open+source+tools+ •  We+need:+ – Speech+recogni4on+engine+ – Speech+databases+/+models+ – Online+speech+server+ – Frontend+interfaces+
  28. 28. Kōnele+app+ Dictate.js+
  29. 29. My+toolchain+ •  Kaldi+ASR+++++++++++++++++++++ hHp://www.kaldi&asr.org/+ •  Kaldi+gstreamer+server+ hHps://github.com/alumae/kaldi&gstreamer& server+ •  Dictate.js++ hHp://kaljurand.github.io/dictate.js/+ •  Kōnele+app+ hHps://kaljurand.github.io/K6nele/+
  30. 30. Demo+

×