Speech recognition

•Transferir como KEY, PDF•

3 gostaram•1,092 visualizações

The talk I gave at Brighton iPhone Dev group on November 24th on Speech Recognition on iOS devices and my new startup, Tiny Ears.

Tecnologia

Tiny Ears
Using Speech
Recognition To Teach
Kids To Read
Emily Toop
Radical Robot
Brighton iPhone Creators
November 2011

What is Speech Recognition?
• Converting spoken words to text
• Not targeted to a single speaker (voice
recognition)
• Utterances converted into phonemes that are
compared against language model & grammar
to generate a hypothesis
• Recognition score to give conﬁdence in
hypothesis

Why is Speech The human brain is

Recognition Hard? incredibly specialised -
speech recognition &
vision has taken millions
of years to perfect. Hard
to make a computer do
the same thing.

• Background Noise
• Detecting gaps
• Too many hypotheses generated
• Accents
• Other Languages
• Dictionary words vs unknown words (i.e.
names)

How Does Siri Work?

• Protocol Cracked - https://
github.com/plamoni/SiriProxy
• Server Based because of CPU & live
data updates - doesn’t work ofﬂine
• Limited vocabulary with well designed
grammar

Device Based
Recognition

• Works ofﬂine
• Immediate response for real time
processing
• No need for expensive data plans for
your app to work

Device Based
Recognition

• Open Ears - http://
www.politepix.com/openears
• Pocket Sphinx/ Sphinx CMU http://
cmusphinx.sourceforge.net/2010/03/
pocketsphinx-0-6-release/
• Limited Language Model
• Limited Grammer

Number Recogniser

• Import OpenEars .xcodeproj into
project
• Add OpenEars as target dependency
• link libOpenEarsLibrary.a binary
• Add OpenEars, SphinxBase &
PocketSphinx to Header Search Path

Number Recogniser
• Create and start audioSessionManager
is delegate
didFinishLaunchingWithOptions

Number recogniser

• Rename .m ﬁle that runs
PocketSphinxController to .mm
• Add OpenEarsEventObserverDelegate

Number recogniser

• -(void)pocketsphinxRecognitionLoopDidStart{}
• -(void)fliteDidFinishSpeaking{} (if using flite for
text to speech)

Improving Recognition
with Face Detection
• Determine when user is speaking
directly to app and not to another
person to enhance accuracy
• Stop listening when face not detected.
• Detect when app has been abandoned
& shut down audio manager etc.
• Start listening when face is detected
again

Demo

• Decorator
• Using Core Image for face detection
WWDC Session Videos numbers 419 &
422

Tiny Ears
• iPad Storybook using Speech
Recognition to listen to children as they
read aloud
• Detect when child stumbles or does not
recognise a word & intervene with
assistance to teach child to read word
• Track reading progress over time to
provide targeted feedback.

Problems -
Educational

• Large Age Range - different kids have
different reading abilities and therefore
require different levels of feedback/
intervention
• Presenting learning in a fun way so
nothing is so difﬁcult child will give up
rather than learn

Problems -
Speech Recognition

• 4 year olds speak very differently from
adults
• how do we detect errors? - unknown
words & mispronounciations
• ‘noise’ words, detecting coughs, laughs
or sounds indicating distress or
difﬁculty

Problems -
Speech Recognition
• Is the child present?
• Is there more than one person present?
• Whose speech should we process?
• Can we even tell?
• Can we detect if the child is in distress
or struggling?
• Can we detect reading ability through
Speech Recognition?

Startup Chile

• Startup Accelerator run by Chilean
government
• US$40k for 6 month, no equity
• Starting January 16th
• Looking for collborators from
education, business, artiﬁcial
intelligence - email me

Questions?

• http://emilytoop.com
• @ﬂuffyemily
• emily@radicalrobot.co.uk
• http://radicalrobot.co.uk

Mais conteúdo relacionado

Mais procurados

International Websites and SoftwareMelody Eye

Artificial intelligenceBusayamart

Clean codeAgniGonalves

What Is Speech Processing?Florian Leibert

CAVE Language Presentation for AI CampLoren Davie

Bots: The Unspoken Challenge of ConversationsMarcus Finley

Mobile and Tablet “App”lications for Language Teaching and Learningfaithmarcel

Conversational experience by SystangoSystango

Mozilla African localisation feedbackDwayne Bailey

Conversational interfaces for chatbot and artificial intelligenceIdowu Adeleke

M3 confErica Kendall

IntroGetachew Mamo

Build Alexa Skills for Hindi - WorkshopIlanchezhian Ganesamurthy

Pronunciation part 2Nombre Apellidos

Lecture 11Tanveer Malik

Synthetic speechsnlprsd09

1. reason why study splZambales National High School

Final presentation on chatbotVaishnaviKhandelwal6

عرض أ فاطمة الظاعنTechnodisability Conference

Open source and free technologies for study skillsE.A. Draffan

Mais procurados (20)

International Websites and Software

Artificial intelligence

Clean code

What Is Speech Processing?

CAVE Language Presentation for AI Camp

Bots: The Unspoken Challenge of Conversations

Mobile and Tablet “App”lications for Language Teaching and Learning

Conversational experience by Systango

Mozilla African localisation feedback

Conversational interfaces for chatbot and artificial intelligence

M3 conf

Intro

Build Alexa Skills for Hindi - Workshop

Pronunciation part 2

Lecture 11

Synthetic speech

1. reason why study spl

Final presentation on chatbot

عرض أ فاطمة الظاعن

Open source and free technologies for study skills

Destaque

FypFurqan Arshad

Term11566Mukesh0420

Face recognitionSatyendra Rajput

HUMAN EMOTION RECOGNIITION SYSTEMsoumi sarkar

Face detection using template matchingBrijesh Borad

Face recognition technologyranjit banshpal

Face Detectionamar kakde

Destaque (7)

Fyp

Term11566

Face recognition

HUMAN EMOTION RECOGNIITION SYSTEM

Face detection using template matching

Face recognition technology

Face Detection

Semelhante a Speech recognition

Natural Language Processing: L01 introductionananth

Let's talk about voiceDotkumo

A New Look at Literacy with iPadsKaren Lirenman

Conversational User Interfaces, Past and FutureCrispin Reedy

Software demo Skills for Business AnalystsHeather L. Cole (J.D)

Speech Recognition TechnologyAamir-sheriff

Integrating Webtools for Performance-based Chinese ClassroomJoanne Shang

Integrating Ipads into the Classroom: Secondary SchoolsSpectronics

Strategies to Support Communication in the ClassroomSpectronics

How Python Changed My Life PyCon Indonesia 2019Fauzan Erich Emmerling

Career of the Software Engineer in Modern Open-Source e-Commerce CompanyVrann Tulika

Teaching Speaking Online TESL 2014Judy Thompson

Assistive technology tools for struggling students in post-secondary educationSpectronics

techtalk2.23.11.ppt.pptAshok Iyengar

Prototyping Accessibility - WordCamp Europe 2018Adrian Roselli

Assistive Technology in the ClassroomReading Horizons

Voice usability testing with WOZ methodology - UX SCOT 2019Abi Reynolds

iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...SCAAC-N

Accessibility and InclusionChris Barber

Proposal presentation.pptxNhlakanipho Majola

Semelhante a Speech recognition (20)

Natural Language Processing: L01 introduction

Let's talk about voice

A New Look at Literacy with iPads

Conversational User Interfaces, Past and Future

Software demo Skills for Business Analysts

Speech Recognition Technology

Integrating Webtools for Performance-based Chinese Classroom

Integrating Ipads into the Classroom: Secondary Schools

Strategies to Support Communication in the Classroom

How Python Changed My Life PyCon Indonesia 2019

Career of the Software Engineer in Modern Open-Source e-Commerce Company

Teaching Speaking Online TESL 2014

Assistive technology tools for struggling students in post-secondary education

techtalk2.23.11.ppt.ppt

Prototyping Accessibility - WordCamp Europe 2018

Assistive Technology in the Classroom

Voice usability testing with WOZ methodology - UX SCOT 2019

iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...

Accessibility and Inclusion

Proposal presentation.pptx

Último

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Story boards and shot lists for my a level piececharlottematthew16

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

WordPress Websites for Engineers: Elevate Your Brandgvaughan

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Advanced Computer Architecture – An IntroductionDilum Bandara

"ML in Production",Oleksandr BaganFwdays

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Speech recognition

1. Tiny Ears Using Speech Recognition To Teach Kids To Read Emily Toop Radical Robot Brighton iPhone Creators November 2011

2. What is Speech Recognition? • Converting spoken words to text • Not targeted to a single speaker (voice recognition) • Utterances converted into phonemes that are compared against language model & grammar to generate a hypothesis • Recognition score to give conﬁdence in hypothesis

3. Why is Speech The human brain is Recognition Hard? incredibly specialised - speech recognition & vision has taken millions of years to perfect. Hard to make a computer do the same thing. • Background Noise • Detecting gaps • Too many hypotheses generated • Accents • Other Languages • Dictionary words vs unknown words (i.e. names)

4. How Does Siri Work? • Protocol Cracked - https:// github.com/plamoni/SiriProxy • Server Based because of CPU & live data updates - doesn’t work ofﬂine • Limited vocabulary with well designed grammar

5. Device Based Recognition • Works ofﬂine • Immediate response for real time processing • No need for expensive data plans for your app to work

6. Device Based Recognition • Open Ears - http:// www.politepix.com/openears • Pocket Sphinx/ Sphinx CMU http:// cmusphinx.sourceforge.net/2010/03/ pocketsphinx-0-6-release/ • Limited Language Model • Limited Grammer

7. Demo Number Recogniser

8. Number Recogniser • Import OpenEars .xcodeproj into project • Add OpenEars as target dependency • link libOpenEarsLibrary.a binary • Add OpenEars, SphinxBase & PocketSphinx to Header Search Path

9. Number Recogniser • Create and start audioSessionManager is delegate didFinishLaunchingWithOptions

10. Number recogniser • Rename .m ﬁle that runs PocketSphinxController to .mm • Add OpenEarsEventObserverDelegate

11. Number Recogniser

12. Number recogniser • -(void)pocketsphinxRecognitionLoopDidStart{} • -(void)fliteDidFinishSpeaking{} (if using flite for text to speech)

13. Improving Recognition with Face Detection • Determine when user is speaking directly to app and not to another person to enhance accuracy • Stop listening when face not detected. • Detect when app has been abandoned & shut down audio manager etc. • Start listening when face is detected again

14. Demo • Decorator • Using Core Image for face detection WWDC Session Videos numbers 419 & 422

15. Kitten Break

16. Kitten Break

17. Tiny Ears • iPad Storybook using Speech Recognition to listen to children as they read aloud • Detect when child stumbles or does not recognise a word & intervene with assistance to teach child to read word • Track reading progress over time to provide targeted feedback.

18. Problems - Educational • Large Age Range - different kids have different reading abilities and therefore require different levels of feedback/ intervention • Presenting learning in a fun way so nothing is so difﬁcult child will give up rather than learn

19. Problems - Speech Recognition • 4 year olds speak very differently from adults • how do we detect errors? - unknown words & mispronounciations • ‘noise’ words, detecting coughs, laughs or sounds indicating distress or difﬁculty

20. Problems - Speech Recognition • Is the child present? • Is there more than one person present? • Whose speech should we process? • Can we even tell? • Can we detect if the child is in distress or struggling? • Can we detect reading ability through Speech Recognition?

21. Startup Chile • Startup Accelerator run by Chilean government • US$40k for 6 month, no equity • Starting January 16th • Looking for collborators from education, business, artiﬁcial intelligence - email me

22. Questions? • http://emilytoop.com • @ﬂuffyemily • emily@radicalrobot.co.uk • http://radicalrobot.co.uk

Notas do Editor

\n
\n
Background Noise - solution possible Noise Rejection Microphones. These are getting better but still aren&#x2019;t fantastic\nDetecting gaps - need loads of training data to train statistical model on expected speech patterns\nHypotheses - lots of CPU required to whittle them down to most likely\nAccents - More training data to cover accents and more CPU to match against language/grammar models\nOther Languages - need a new model or every language\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
error detection - car/care, ph vs f and silent letters - hour\n
1) should we ignore or accept sound input as speech?\n3) - visually or through &#x2018;noise&#x2019; word detection\n\n
\n
\n

Speech recognition

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Speech recognition

Semelhante a Speech recognition (20)

Último

Último (20)

Speech recognition

Notas do Editor