Speech recognition is the process of translating spoken words to text. It involves recording and digitizing audio, segmenting it into phonemes, applying a recognition model to analyze the phonemes against a lexicon and grammar, and returning a confidence-weighted transcript. Speech recognition accuracy is around 92% for English but lower for other languages. Mobile apps can use platform-specific APIs like Google Now on Android while the W3C specification allows cross-browser support. The related Speech Synthesis API can output responses by voice. Together these APIs enable interactive speech applications.
28. Why is that?
When two people talk
comprehension rates are better
than 97%
29. A really good english language
speech recognition system is
right 92% of the time
30. Where does that extra 5% in
error rate come from?
Vocabulary size and confusability
Speaker dependence vs independence
Isolated or continuous speech
Initiated vs spontaneous speech
Adverse conditions
31. Mobile Speech Recognition
OS Application SDK
Android Google Now Java API
iOS Siri Many 3rd party Obj-C SDK's
Windows Phone Cortana C# API
53. The SpeechSynthesis spec
looks like this:
interfaceSpeechSynthesis{
readonlyattributebooleanpending;
readonlyattributebooleanspeaking;
readonlyattributebooleanpaused;
voidspeak(SpeechSynthesisUtteranceutterance);
voidcancel();
voidpause();
voidresume();
SpeechSynthesisVoiceListgetVoices();
};
54. The SpeechSynthesisUtterance
spec looks like this:
interfaceSpeechSynthesisUtterance:EventTarget{
attributeDOMStringtext;
attributeDOMStringlang;
attributeDOMStringvoiceURI;
attributefloatvolume;
attributefloatrate;
attributefloatpitch;
};
55. With additional event methods
to control behaviour:
attributeEventHandleronstart;
attributeEventHandleronend;
attributeEventHandleronerror;
attributeEventHandleronpause;
attributeEventHandleronresume;
attributeEventHandleronmark;
attributeEventHandleronboundary;
58. * Working with Julio César (@jcesarmobile) to get iOS done
Availability
OS Recognition Synthesis
Android ✓ ✓
iOS* Soonish Native to iOS 7.0+
Windows Phone × ×
60. For more information on hybrid
applications
Check out Nick Van
Weerdenburg and Andrey
Feldman presentation on
Creating a Comprehensive
Social Media App Using Ionic
and Phone Gap 3:45pm today
in 801A.
65. Types of Speech Recognition
Applications
Voice Web Search
Speech Command Interface
Continuous Recognition of Open Dialog
Domain Specific Grammars Filling Multiple Input Fields
Speech UI present when no visible UI need be present
Voice Activity Detection
Speech Translation
Multimodal Interaction
Speech Driving Directions