1. Tiny Ears
Using Speech
Recognition To Teach
Kids To Read
Emily Toop
Radical Robot
Brighton iPhone Creators
November 2011
2. What is Speech Recognition?
• Converting spoken words to text
• Not targeted to a single speaker (voice
recognition)
• Utterances converted into phonemes that are
compared against language model & grammar
to generate a hypothesis
• Recognition score to give confidence in
hypothesis
3. Why is Speech The human brain is
Recognition Hard? incredibly specialised -
speech recognition &
vision has taken millions
of years to perfect. Hard
to make a computer do
the same thing.
• Background Noise
• Detecting gaps
• Too many hypotheses generated
• Accents
• Other Languages
• Dictionary words vs unknown words (i.e.
names)
4. How Does Siri Work?
• Protocol Cracked - https://
github.com/plamoni/SiriProxy
• Server Based because of CPU & live
data updates - doesn’t work offline
• Limited vocabulary with well designed
grammar
5. Device Based
Recognition
• Works offline
• Immediate response for real time
processing
• No need for expensive data plans for
your app to work
6. Device Based
Recognition
• Open Ears - http://
www.politepix.com/openears
• Pocket Sphinx/ Sphinx CMU http://
cmusphinx.sourceforge.net/2010/03/
pocketsphinx-0-6-release/
• Limited Language Model
• Limited Grammer
12. Number recogniser
• -(void)pocketsphinxRecognitionLoopDidStart{}
• -(void)fliteDidFinishSpeaking{} (if using flite for
text to speech)
13. Improving Recognition
with Face Detection
• Determine when user is speaking
directly to app and not to another
person to enhance accuracy
• Stop listening when face not detected.
• Detect when app has been abandoned
& shut down audio manager etc.
• Start listening when face is detected
again
17. Tiny Ears
• iPad Storybook using Speech
Recognition to listen to children as they
read aloud
• Detect when child stumbles or does not
recognise a word & intervene with
assistance to teach child to read word
• Track reading progress over time to
provide targeted feedback.
18. Problems -
Educational
• Large Age Range - different kids have
different reading abilities and therefore
require different levels of feedback/
intervention
• Presenting learning in a fun way so
nothing is so difficult child will give up
rather than learn
19. Problems -
Speech Recognition
• 4 year olds speak very differently from
adults
• how do we detect errors? - unknown
words & mispronounciations
• ‘noise’ words, detecting coughs, laughs
or sounds indicating distress or
difficulty
20. Problems -
Speech Recognition
• Is the child present?
• Is there more than one person present?
• Whose speech should we process?
• Can we even tell?
• Can we detect if the child is in distress
or struggling?
• Can we detect reading ability through
Speech Recognition?
21. Startup Chile
• Startup Accelerator run by Chilean
government
• US$40k for 6 month, no equity
• Starting January 16th
• Looking for collborators from
education, business, artificial
intelligence - email me
Background Noise - solution possible Noise Rejection Microphones. These are getting better but still aren’t fantastic\nDetecting gaps - need loads of training data to train statistical model on expected speech patterns\nHypotheses - lots of CPU required to whittle them down to most likely\nAccents - More training data to cover accents and more CPU to match against language/grammar models\nOther Languages - need a new model or every language\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
error detection - car/care, ph vs f and silent letters - hour\n
1) should we ignore or accept sound input as speech?\n3) - visually or through ‘noise’ word detection\n\n