SlideShare uma empresa Scribd logo
1 de 22
Tiny Ears
   Using Speech
Recognition To Teach
    Kids To Read
          Emily Toop
        Radical Robot
   Brighton iPhone Creators
        November 2011
What is Speech Recognition?
• Converting spoken words to text
• Not targeted to a single speaker (voice
  recognition)
• Utterances converted into phonemes that are
  compared against language model & grammar
  to generate a hypothesis
• Recognition score to give confidence in
  hypothesis
Why is Speech            The human brain is

       Recognition Hard?          incredibly specialised -
                                  speech recognition &
                                  vision has taken millions
                                  of years to perfect. Hard
                                  to make a computer do
                                  the same thing.



• Background Noise
• Detecting gaps
• Too many hypotheses generated
• Accents
• Other Languages
• Dictionary words vs unknown words (i.e.
  names)
How Does Siri Work?

• Protocol Cracked - https://
  github.com/plamoni/SiriProxy
• Server Based because of CPU & live
  data updates - doesn’t work offline
• Limited vocabulary with well designed
  grammar
Device Based
        Recognition


• Works offline
• Immediate response for real time
  processing
• No need for expensive data plans for
  your app to work
Device Based
        Recognition

• Open Ears - http://
  www.politepix.com/openears
• Pocket Sphinx/ Sphinx CMU http://
  cmusphinx.sourceforge.net/2010/03/
  pocketsphinx-0-6-release/
• Limited Language Model
• Limited Grammer
Demo




Number Recogniser
Number Recogniser

• Import OpenEars .xcodeproj into
  project
• Add OpenEars as target dependency
• link libOpenEarsLibrary.a binary
• Add OpenEars, SphinxBase &
  PocketSphinx to Header Search Path
Number Recogniser
• Create and start audioSessionManager
  is delegate
  didFinishLaunchingWithOptions
Number recogniser

• Rename .m file that runs
  PocketSphinxController to .mm
• Add OpenEarsEventObserverDelegate
Number Recogniser
Number recogniser



•   -(void)pocketsphinxRecognitionLoopDidStart{}
•   -(void)fliteDidFinishSpeaking{} (if using flite for
    text to speech)
Improving Recognition
 with Face Detection
• Determine when user is speaking
  directly to app and not to another
  person to enhance accuracy
• Stop listening when face not detected.
• Detect when app has been abandoned
  & shut down audio manager etc.
• Start listening when face is detected
  again
Demo


• Decorator
• Using Core Image for face detection
  WWDC Session Videos numbers 419 &
  422
Kitten Break
Kitten Break
Tiny Ears
• iPad Storybook using Speech
  Recognition to listen to children as they
  read aloud
• Detect when child stumbles or does not
  recognise a word & intervene with
  assistance to teach child to read word
• Track reading progress over time to
  provide targeted feedback.
Problems -
          Educational

• Large Age Range - different kids have
  different reading abilities and therefore
  require different levels of feedback/
  intervention
• Presenting learning in a fun way so
  nothing is so difficult child will give up
  rather than learn
Problems -
   Speech Recognition

• 4 year olds speak very differently from
  adults
• how do we detect errors? - unknown
  words & mispronounciations
• ‘noise’ words, detecting coughs, laughs
  or sounds indicating distress or
  difficulty
Problems -
   Speech Recognition
• Is the child present?
• Is there more than one person present?
 • Whose speech should we process?
 • Can we even tell?
• Can we detect if the child is in distress
  or struggling?
• Can we detect reading ability through
  Speech Recognition?
Startup Chile

• Startup Accelerator run by Chilean
  government
• US$40k for 6 month, no equity
• Starting January 16th
• Looking for collborators from
  education, business, artificial
  intelligence - email me
Questions?


• http://emilytoop.com
• @fluffyemily
• emily@radicalrobot.co.uk
• http://radicalrobot.co.uk

Mais conteúdo relacionado

Mais procurados

International Websites and Software
International Websites and SoftwareInternational Websites and Software
International Websites and SoftwareMelody Eye
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceBusayamart
 
What Is Speech Processing?
What Is Speech Processing?What Is Speech Processing?
What Is Speech Processing?Florian Leibert
 
CAVE Language Presentation for AI Camp
CAVE Language Presentation for AI CampCAVE Language Presentation for AI Camp
CAVE Language Presentation for AI CampLoren Davie
 
Bots: The Unspoken Challenge of Conversations
Bots: The Unspoken Challenge of ConversationsBots: The Unspoken Challenge of Conversations
Bots: The Unspoken Challenge of ConversationsMarcus Finley
 
Mobile and Tablet “App”lications for Language Teaching and Learning
Mobile and Tablet “App”lications for Language Teaching and LearningMobile and Tablet “App”lications for Language Teaching and Learning
Mobile and Tablet “App”lications for Language Teaching and Learningfaithmarcel
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by SystangoSystango
 
Mozilla African localisation feedback
Mozilla African localisation feedbackMozilla African localisation feedback
Mozilla African localisation feedbackDwayne Bailey
 
Conversational interfaces for chatbot and artificial intelligence
Conversational interfaces for chatbot and artificial intelligenceConversational interfaces for chatbot and artificial intelligence
Conversational interfaces for chatbot and artificial intelligenceIdowu Adeleke
 
Synthetic speech
Synthetic speechSynthetic speech
Synthetic speechsnlprsd09
 
Open source and free technologies for study skills
Open source and free technologies for study skillsOpen source and free technologies for study skills
Open source and free technologies for study skillsE.A. Draffan
 

Mais procurados (20)

International Websites and Software
International Websites and SoftwareInternational Websites and Software
International Websites and Software
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Clean code
Clean codeClean code
Clean code
 
What Is Speech Processing?
What Is Speech Processing?What Is Speech Processing?
What Is Speech Processing?
 
CAVE Language Presentation for AI Camp
CAVE Language Presentation for AI CampCAVE Language Presentation for AI Camp
CAVE Language Presentation for AI Camp
 
Bots: The Unspoken Challenge of Conversations
Bots: The Unspoken Challenge of ConversationsBots: The Unspoken Challenge of Conversations
Bots: The Unspoken Challenge of Conversations
 
Mobile and Tablet “App”lications for Language Teaching and Learning
Mobile and Tablet “App”lications for Language Teaching and LearningMobile and Tablet “App”lications for Language Teaching and Learning
Mobile and Tablet “App”lications for Language Teaching and Learning
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by Systango
 
Mozilla African localisation feedback
Mozilla African localisation feedbackMozilla African localisation feedback
Mozilla African localisation feedback
 
Conversational interfaces for chatbot and artificial intelligence
Conversational interfaces for chatbot and artificial intelligenceConversational interfaces for chatbot and artificial intelligence
Conversational interfaces for chatbot and artificial intelligence
 
M3 conf
M3 confM3 conf
M3 conf
 
Intro
IntroIntro
Intro
 
Build Alexa Skills for Hindi - Workshop
Build Alexa Skills for Hindi - WorkshopBuild Alexa Skills for Hindi - Workshop
Build Alexa Skills for Hindi - Workshop
 
Pronunciation part 2
Pronunciation part 2Pronunciation part 2
Pronunciation part 2
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Synthetic speech
Synthetic speechSynthetic speech
Synthetic speech
 
1. reason why study spl
1. reason why study spl1. reason why study spl
1. reason why study spl
 
Final presentation on chatbot
Final presentation on chatbotFinal presentation on chatbot
Final presentation on chatbot
 
عرض أ فاطمة الظاعن
عرض أ فاطمة الظاعنعرض أ فاطمة الظاعن
عرض أ فاطمة الظاعن
 
Open source and free technologies for study skills
Open source and free technologies for study skillsOpen source and free technologies for study skills
Open source and free technologies for study skills
 

Destaque

HUMAN EMOTION RECOGNIITION SYSTEM
HUMAN EMOTION RECOGNIITION SYSTEMHUMAN EMOTION RECOGNIITION SYSTEM
HUMAN EMOTION RECOGNIITION SYSTEMsoumi sarkar
 
Face detection using template matching
Face detection using template matchingFace detection using template matching
Face detection using template matchingBrijesh Borad
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technologyranjit banshpal
 
Face Detection
Face DetectionFace Detection
Face Detectionamar kakde
 

Destaque (7)

Fyp
FypFyp
Fyp
 
Term11566
Term11566Term11566
Term11566
 
Face recognition
Face recognitionFace recognition
Face recognition
 
HUMAN EMOTION RECOGNIITION SYSTEM
HUMAN EMOTION RECOGNIITION SYSTEMHUMAN EMOTION RECOGNIITION SYSTEM
HUMAN EMOTION RECOGNIITION SYSTEM
 
Face detection using template matching
Face detection using template matchingFace detection using template matching
Face detection using template matching
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
 
Face Detection
Face DetectionFace Detection
Face Detection
 

Semelhante a Speech recognition

Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Let's talk about voice
Let's talk about voiceLet's talk about voice
Let's talk about voiceDotkumo
 
A New Look at Literacy with iPads
A New Look at Literacy with iPadsA New Look at Literacy with iPads
A New Look at Literacy with iPadsKaren Lirenman
 
Conversational User Interfaces, Past and Future
Conversational User Interfaces, Past and FutureConversational User Interfaces, Past and Future
Conversational User Interfaces, Past and FutureCrispin Reedy
 
Software demo Skills for Business Analysts
Software demo Skills for Business AnalystsSoftware demo Skills for Business Analysts
Software demo Skills for Business AnalystsHeather L. Cole (J.D)
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Integrating Webtools for Performance-based Chinese Classroom
Integrating Webtools for Performance-based Chinese ClassroomIntegrating Webtools for Performance-based Chinese Classroom
Integrating Webtools for Performance-based Chinese ClassroomJoanne Shang
 
Integrating Ipads into the Classroom: Secondary Schools
Integrating Ipads into the Classroom: Secondary SchoolsIntegrating Ipads into the Classroom: Secondary Schools
Integrating Ipads into the Classroom: Secondary SchoolsSpectronics
 
Strategies to Support Communication in the Classroom
Strategies to Support Communication in the ClassroomStrategies to Support Communication in the Classroom
Strategies to Support Communication in the ClassroomSpectronics
 
How Python Changed My Life PyCon Indonesia 2019
How Python Changed My Life   PyCon Indonesia 2019How Python Changed My Life   PyCon Indonesia 2019
How Python Changed My Life PyCon Indonesia 2019Fauzan Erich Emmerling
 
Career of the Software Engineer in Modern Open-Source e-Commerce Company
Career of the Software Engineer in Modern Open-Source e-Commerce CompanyCareer of the Software Engineer in Modern Open-Source e-Commerce Company
Career of the Software Engineer in Modern Open-Source e-Commerce CompanyVrann Tulika
 
Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014Judy Thompson
 
Assistive technology tools for struggling students in post-secondary education
Assistive technology tools for struggling students in post-secondary educationAssistive technology tools for struggling students in post-secondary education
Assistive technology tools for struggling students in post-secondary educationSpectronics
 
techtalk2.23.11.ppt.ppt
techtalk2.23.11.ppt.ppttechtalk2.23.11.ppt.ppt
techtalk2.23.11.ppt.pptAshok Iyengar
 
Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018Adrian Roselli
 
Assistive Technology in the Classroom
Assistive Technology in the ClassroomAssistive Technology in the Classroom
Assistive Technology in the ClassroomReading Horizons
 
Voice usability testing with WOZ methodology - UX SCOT 2019
Voice usability testing with WOZ methodology - UX SCOT 2019Voice usability testing with WOZ methodology - UX SCOT 2019
Voice usability testing with WOZ methodology - UX SCOT 2019Abi Reynolds
 
iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...
iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...
iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...SCAAC-N
 
Accessibility and Inclusion
Accessibility and InclusionAccessibility and Inclusion
Accessibility and InclusionChris Barber
 

Semelhante a Speech recognition (20)

Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Let's talk about voice
Let's talk about voiceLet's talk about voice
Let's talk about voice
 
A New Look at Literacy with iPads
A New Look at Literacy with iPadsA New Look at Literacy with iPads
A New Look at Literacy with iPads
 
Conversational User Interfaces, Past and Future
Conversational User Interfaces, Past and FutureConversational User Interfaces, Past and Future
Conversational User Interfaces, Past and Future
 
Software demo Skills for Business Analysts
Software demo Skills for Business AnalystsSoftware demo Skills for Business Analysts
Software demo Skills for Business Analysts
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Integrating Webtools for Performance-based Chinese Classroom
Integrating Webtools for Performance-based Chinese ClassroomIntegrating Webtools for Performance-based Chinese Classroom
Integrating Webtools for Performance-based Chinese Classroom
 
Integrating Ipads into the Classroom: Secondary Schools
Integrating Ipads into the Classroom: Secondary SchoolsIntegrating Ipads into the Classroom: Secondary Schools
Integrating Ipads into the Classroom: Secondary Schools
 
Strategies to Support Communication in the Classroom
Strategies to Support Communication in the ClassroomStrategies to Support Communication in the Classroom
Strategies to Support Communication in the Classroom
 
How Python Changed My Life PyCon Indonesia 2019
How Python Changed My Life   PyCon Indonesia 2019How Python Changed My Life   PyCon Indonesia 2019
How Python Changed My Life PyCon Indonesia 2019
 
Career of the Software Engineer in Modern Open-Source e-Commerce Company
Career of the Software Engineer in Modern Open-Source e-Commerce CompanyCareer of the Software Engineer in Modern Open-Source e-Commerce Company
Career of the Software Engineer in Modern Open-Source e-Commerce Company
 
Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014
 
Assistive technology tools for struggling students in post-secondary education
Assistive technology tools for struggling students in post-secondary educationAssistive technology tools for struggling students in post-secondary education
Assistive technology tools for struggling students in post-secondary education
 
techtalk2.23.11.ppt.ppt
techtalk2.23.11.ppt.ppttechtalk2.23.11.ppt.ppt
techtalk2.23.11.ppt.ppt
 
Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018
 
Assistive Technology in the Classroom
Assistive Technology in the ClassroomAssistive Technology in the Classroom
Assistive Technology in the Classroom
 
Voice usability testing with WOZ methodology - UX SCOT 2019
Voice usability testing with WOZ methodology - UX SCOT 2019Voice usability testing with WOZ methodology - UX SCOT 2019
Voice usability testing with WOZ methodology - UX SCOT 2019
 
iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...
iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...
iAccess Language: iPad Apps for Building Vocabulary, Grammar and Understandin...
 
Accessibility and Inclusion
Accessibility and InclusionAccessibility and Inclusion
Accessibility and Inclusion
 
Proposal presentation.pptx
Proposal presentation.pptxProposal presentation.pptx
Proposal presentation.pptx
 

Último

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Speech recognition

  • 1. Tiny Ears Using Speech Recognition To Teach Kids To Read Emily Toop Radical Robot Brighton iPhone Creators November 2011
  • 2. What is Speech Recognition? • Converting spoken words to text • Not targeted to a single speaker (voice recognition) • Utterances converted into phonemes that are compared against language model & grammar to generate a hypothesis • Recognition score to give confidence in hypothesis
  • 3. Why is Speech The human brain is Recognition Hard? incredibly specialised - speech recognition & vision has taken millions of years to perfect. Hard to make a computer do the same thing. • Background Noise • Detecting gaps • Too many hypotheses generated • Accents • Other Languages • Dictionary words vs unknown words (i.e. names)
  • 4. How Does Siri Work? • Protocol Cracked - https:// github.com/plamoni/SiriProxy • Server Based because of CPU & live data updates - doesn’t work offline • Limited vocabulary with well designed grammar
  • 5. Device Based Recognition • Works offline • Immediate response for real time processing • No need for expensive data plans for your app to work
  • 6. Device Based Recognition • Open Ears - http:// www.politepix.com/openears • Pocket Sphinx/ Sphinx CMU http:// cmusphinx.sourceforge.net/2010/03/ pocketsphinx-0-6-release/ • Limited Language Model • Limited Grammer
  • 8. Number Recogniser • Import OpenEars .xcodeproj into project • Add OpenEars as target dependency • link libOpenEarsLibrary.a binary • Add OpenEars, SphinxBase & PocketSphinx to Header Search Path
  • 9. Number Recogniser • Create and start audioSessionManager is delegate didFinishLaunchingWithOptions
  • 10. Number recogniser • Rename .m file that runs PocketSphinxController to .mm • Add OpenEarsEventObserverDelegate
  • 12. Number recogniser • -(void)pocketsphinxRecognitionLoopDidStart{} • -(void)fliteDidFinishSpeaking{} (if using flite for text to speech)
  • 13. Improving Recognition with Face Detection • Determine when user is speaking directly to app and not to another person to enhance accuracy • Stop listening when face not detected. • Detect when app has been abandoned & shut down audio manager etc. • Start listening when face is detected again
  • 14. Demo • Decorator • Using Core Image for face detection WWDC Session Videos numbers 419 & 422
  • 17. Tiny Ears • iPad Storybook using Speech Recognition to listen to children as they read aloud • Detect when child stumbles or does not recognise a word & intervene with assistance to teach child to read word • Track reading progress over time to provide targeted feedback.
  • 18. Problems - Educational • Large Age Range - different kids have different reading abilities and therefore require different levels of feedback/ intervention • Presenting learning in a fun way so nothing is so difficult child will give up rather than learn
  • 19. Problems - Speech Recognition • 4 year olds speak very differently from adults • how do we detect errors? - unknown words & mispronounciations • ‘noise’ words, detecting coughs, laughs or sounds indicating distress or difficulty
  • 20. Problems - Speech Recognition • Is the child present? • Is there more than one person present? • Whose speech should we process? • Can we even tell? • Can we detect if the child is in distress or struggling? • Can we detect reading ability through Speech Recognition?
  • 21. Startup Chile • Startup Accelerator run by Chilean government • US$40k for 6 month, no equity • Starting January 16th • Looking for collborators from education, business, artificial intelligence - email me
  • 22. Questions? • http://emilytoop.com • @fluffyemily • emily@radicalrobot.co.uk • http://radicalrobot.co.uk

Notas do Editor

  1. \n
  2. \n
  3. Background Noise - solution possible Noise Rejection Microphones. These are getting better but still aren’t fantastic\nDetecting gaps - need loads of training data to train statistical model on expected speech patterns\nHypotheses - lots of CPU required to whittle them down to most likely\nAccents - More training data to cover accents and more CPU to match against language/grammar models\nOther Languages - need a new model or every language\n\n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. error detection - car/care, ph vs f and silent letters - hour\n
  21. 1) should we ignore or accept sound input as speech?\n3) - visually or through ‘noise’ word detection\n\n
  22. \n
  23. \n