SlideShare uma empresa Scribd logo
1 de 19
Speech Recognition
An Overview

Submitted To:

Submitted By:

Name: Dr. M Syamala Devi

Name: Varun Jain

Designation: Professor

Roll No.: 34

Department: DCSA, Panjab
University, Chandigarh

Class: MCA 5th Semester (M)

Panjab University, Chandigarh
Table of Contents


Introduction



Introduction – A closer look



Terms and Concepts in Speech Recognition



Pronunciation





Utterances
Grammar

How it works


Acoustic Model



Grammar



Recognized Text



Performance Measurement



Standards



Google Search by Voice (GSbV) – A case study


GSbV – A screenshot



GSbV – An example



Conclusions



References
Introduction


Have you ever talked to your computer?



I mean, have you really, really talked to your computer?



Where it actually recognized what you said and then did something as a result?



If you have, then you've used a technology known as speech recognition.



It is the process of converting spoken input to text.



Speech recognition allows you to provide input to an application with your voice.



Speech recognition is thus sometimes referred to as speech-to-text.



In the desktop world, you need a microphone to be able to do this.



The speech recognition process is performed by a software component known as
the speech recognition engine.



The primary function of the speech recognition engine is to process spoken input
and translate it into text that an application understands.
Introduction – A closer look


Upon hearing a voice as input, a speech recognition application can do one of
the following thing:






The application can interpret the result of the recognition as a command. In this
case, the application is a command and control application. An example of a
command and control application is one in which the caller says “check balance”,
and the application returns the current balance of the caller’s account.
If an application handles the recognized text simply as text, then it is considered a
dictation application. In a dictation application, if you said “check balance,” the
application would not interpret the result, but simply return the text “check
balance”.

For example, in response to hearing "Please say coffee, tea, or milk," you say
"coffee" and the speech recognition application you're calling tells you what
the flavor of the day is and then asks if you'd like to place an order.
Terms and Concepts in Speech
Recognition


There are basically three terms that are majorly used in context of speech
recognition concept.



These three are:


Utterance



Pronunciation



Grammar
Utterance


When the user says something, this is known as an utterance.



An utterance is any stream of speech between two periods of silence.



Utterances are sent to the speech engine to be processed.



Silence, in speech recognition, is almost as important as what is spoken, because
silence delineates the start and end of an utterance.



The speech recognition engine is "listening" for speech input.



When the engine detects audio input - in other words, a lack of silence -- the
beginning of an utterance is signaled.



Similarly, when the engine detects a certain amount of silence following the
audio, the end of the utterance occurs.



If the user doesn’t say anything, the engine returns what is known as a silence
timeout - an indication that there was no speech detected within the expected
timeframe, and the application takes an appropriate action, such as re-prompting
the user for input.
Pronunciation


The speech recognition engine uses all sorts of data, statistical models, and
algorithms to convert spoken input into text.



One piece of information that the speech recognition engine uses to process a
word is its pronunciation, which represents what the speech engine thinks a
word should sound like.



Words can have multiple pronunciations associated with them.



For example, the word “the” has at least two pronunciations in the U.S.
English language: “thee” and “thuh.”
Grammar


One must specify the words and phrases that users can say to your
application.



These words and phrases are defined to the speech recognition engine and
are used in the recognition process.



You can specify the valid words and phrases in a number of different ways,
but in application, you do this by specifying a grammar.



A grammar uses a particular syntax, or set of rules, to define the words and
phrases that can be recognized by the engine.



A grammar can be as simple as a list of words, or it can be flexible enough to
allow such variability in what can be said that it approaches natural language
capability.
How it works

Fig. 1 A basic working model of speech recognition
depicting how a general speech recognition works.
Acoustic Model


Acoustic models provide an estimate for the likelihood of the observed
features in a frame of speech given a particular phonetic context.



The features are typically related to measurements of the spectral
characteristics of a time-slice of speech.



While individual recipes for training acoustic models vary in their structure
and sequencing, the basic process involves aligning transcribed speech to
states within an existing acoustic model, accumulating frames associated with
each state, and re-estimating the probability distributions associated with the
state, given the features observed in those frames.
Grammar


A grammar can be as simple as a list of words, or it can be flexible enough to
allow such variability in what can be said that it approaches natural language
capability.



Grammars define the domain, or context, within which the recognition engine
works.



The engine compares the current utterance against the words and phrases in the
active grammars.



If the user says something that is not in the grammar, the speech engine will not
be able to decipher it correctly.



You can define a single grammar for your application, or you may have multiple
grammars.



Chances are, you will have multiple grammars, and you will activate each
grammar only when it is needed.
Recognized Text


It is an output of this model that produces a likelihood if what user has
provided as input commonly known as recognized text.



This text can be processed in two way by the application using this model:


As a command



As a dictation text



As a command, a further processing is made upon the text which serves as
output



As a dictation text, a stream of text is shown as output as a formatted text
just like a text can be shown on screen.
Performance Measurement


The performance of a speech recognition system is measurable.



Perhaps the most widely used measurement is accuracy.



Arguably the most important measurement of accuracy is whether the desired
end result occurred.



This measurement is useful in validating application design.



For example, if the user said "yes," the engine returned "yes," and the "YES"
action was executed, it is clear that the desired end result was achieved.



But what happens if the engine returns text that does not exactly match the
utterance?



For example, what if the user said "nope," the engine returned "no," yet the
"NO" action was executed?
Standard


There are a lot of standards by World Wide Web Consortium (W3C)



Some of them are enlisted below:


VoiceXML



SRGS (Speech Recognition Grammar Specification)



SISR (Semantic Interpretation for Speech Recognition)



SSML (Speech Synthesis Markup Language)



PLS (Pronunciation Lexicon Specification)



CCXML (Call Control eXtensible Markup Language)



MediaCTRL



MSML (Media Server Markup Language)



MSCML (Media Server Control Markup Language)
Google search by voice: A case study


Mobile web search is a rapidly growing area of interest.



Internet-enabled Smart phones account for an increasing share of the mobile
devices sold throughout the world, and most models offer a web browsing
experience that rivals desktop computers in display quality.



Users are increasingly turning to their mobile devices when doing web
searches, driving efforts to enhance the usability of web search on these
devices.



Although mobile device usability has improved, typing search queries can still
be cumbersome, error-prone, and even dangerous in some usage scenarios.
GSbV – A screenshot
GSbV – An example

Figure 3. An example of a speech recognizer app GSbV
Conclusions


Speech recognition will revolutionize the way people conduct business over
the Web and will, ultimately, differentiate world-class e-businesses.



VoiceXML ties speech recognition and telephony together and provides the
technology with which businesses can develop and deploy voice-enabled Web
solutions TODAY!



These solutions can greatly expand the accessibility of Web-based self-service
transactions to customers who would otherwise not have access, and, at the
same time, leverage a business’ existing Web investments.



Speech recognition and VoiceXML clearly represent the next wave of the Web.
References



http://cmusphinx.sourceforge.net/wiki/tutorialconcepts



ftp://public.dhe.ibm.com/software/pervasive/info/products/Introduction_to
_Speech_Reco gnition.pdf



https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved
=0CDAQFjAB&url=http%3A%2F%2Fresearch.google.com%2Fpubs%2Farchive%2F3
6340.pdf&ei=13GVUoW5CY6FrAeZ04GgDg&usg=AFQjCNEWeZqQk1pIkXjtzrNG__
ZR-UUNbA&bvm=bv.57155469,d.bmk&cad=rja



http://en.wikipedia.org/wiki/VoiceXML

Mais conteúdo relacionado

Mais procurados

Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Amrita More
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
Nsaroj kumar
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
Moumita132
 

Mais procurados (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Voice browser
Voice browserVoice browser
Voice browser
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 

Destaque

Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
Sarang Afle
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
tcab22
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
子毅 楊
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
Neetu Jain
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
Alexandru Chica
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
Chiranjeevi Adi
 
Applying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person ShootersApplying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person Shooters
hbbalfred
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
boddu syamprasad
 

Destaque (16)

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Applying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person ShootersApplying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person Shooters
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 

Semelhante a Speech recognition an overview

Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
IJERA Editor
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
Thejus Joby
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
Utphala P
 
NICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperNICE Speech Analytics - White Paper
NICE Speech Analytics - White Paper
Clark Hill
 

Semelhante a Speech recognition an overview (20)

Seminar
SeminarSeminar
Seminar
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice command
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
 
NICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperNICE Speech Analytics - White Paper
NICE Speech Analytics - White Paper
 
Google Voice-to-text
Google Voice-to-textGoogle Voice-to-text
Google Voice-to-text
 
10
1010
10
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech ServerTulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 

Mais de Varun Jain

Java in web 2 0 presentation
Java in web 2 0 presentationJava in web 2 0 presentation
Java in web 2 0 presentation
Varun Jain
 
Ready transmesia media
Ready transmesia mediaReady transmesia media
Ready transmesia media
Varun Jain
 
Computer communications and networks
Computer communications and networksComputer communications and networks
Computer communications and networks
Varun Jain
 
The brain computing
The brain computingThe brain computing
The brain computing
Varun Jain
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Varun Jain
 
Billing software
Billing softwareBilling software
Billing software
Varun Jain
 
Making power point slides
Making power point slidesMaking power point slides
Making power point slides
Varun Jain
 

Mais de Varun Jain (8)

Mobile devices
Mobile devicesMobile devices
Mobile devices
 
Java in web 2 0 presentation
Java in web 2 0 presentationJava in web 2 0 presentation
Java in web 2 0 presentation
 
Ready transmesia media
Ready transmesia mediaReady transmesia media
Ready transmesia media
 
Computer communications and networks
Computer communications and networksComputer communications and networks
Computer communications and networks
 
The brain computing
The brain computingThe brain computing
The brain computing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Billing software
Billing softwareBilling software
Billing software
 
Making power point slides
Making power point slidesMaking power point slides
Making power point slides
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Speech recognition an overview

  • 1. Speech Recognition An Overview Submitted To: Submitted By: Name: Dr. M Syamala Devi Name: Varun Jain Designation: Professor Roll No.: 34 Department: DCSA, Panjab University, Chandigarh Class: MCA 5th Semester (M) Panjab University, Chandigarh
  • 2. Table of Contents  Introduction  Introduction – A closer look  Terms and Concepts in Speech Recognition   Pronunciation   Utterances Grammar How it works  Acoustic Model  Grammar  Recognized Text  Performance Measurement  Standards  Google Search by Voice (GSbV) – A case study  GSbV – A screenshot  GSbV – An example  Conclusions  References
  • 3. Introduction  Have you ever talked to your computer?  I mean, have you really, really talked to your computer?  Where it actually recognized what you said and then did something as a result?  If you have, then you've used a technology known as speech recognition.  It is the process of converting spoken input to text.  Speech recognition allows you to provide input to an application with your voice.  Speech recognition is thus sometimes referred to as speech-to-text.  In the desktop world, you need a microphone to be able to do this.  The speech recognition process is performed by a software component known as the speech recognition engine.  The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands.
  • 4. Introduction – A closer look  Upon hearing a voice as input, a speech recognition application can do one of the following thing:    The application can interpret the result of the recognition as a command. In this case, the application is a command and control application. An example of a command and control application is one in which the caller says “check balance”, and the application returns the current balance of the caller’s account. If an application handles the recognized text simply as text, then it is considered a dictation application. In a dictation application, if you said “check balance,” the application would not interpret the result, but simply return the text “check balance”. For example, in response to hearing "Please say coffee, tea, or milk," you say "coffee" and the speech recognition application you're calling tells you what the flavor of the day is and then asks if you'd like to place an order.
  • 5. Terms and Concepts in Speech Recognition  There are basically three terms that are majorly used in context of speech recognition concept.  These three are:  Utterance  Pronunciation  Grammar
  • 6. Utterance  When the user says something, this is known as an utterance.  An utterance is any stream of speech between two periods of silence.  Utterances are sent to the speech engine to be processed.  Silence, in speech recognition, is almost as important as what is spoken, because silence delineates the start and end of an utterance.  The speech recognition engine is "listening" for speech input.  When the engine detects audio input - in other words, a lack of silence -- the beginning of an utterance is signaled.  Similarly, when the engine detects a certain amount of silence following the audio, the end of the utterance occurs.  If the user doesn’t say anything, the engine returns what is known as a silence timeout - an indication that there was no speech detected within the expected timeframe, and the application takes an appropriate action, such as re-prompting the user for input.
  • 7. Pronunciation  The speech recognition engine uses all sorts of data, statistical models, and algorithms to convert spoken input into text.  One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like.  Words can have multiple pronunciations associated with them.  For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh.”
  • 8. Grammar  One must specify the words and phrases that users can say to your application.  These words and phrases are defined to the speech recognition engine and are used in the recognition process.  You can specify the valid words and phrases in a number of different ways, but in application, you do this by specifying a grammar.  A grammar uses a particular syntax, or set of rules, to define the words and phrases that can be recognized by the engine.  A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.
  • 9. How it works Fig. 1 A basic working model of speech recognition depicting how a general speech recognition works.
  • 10. Acoustic Model  Acoustic models provide an estimate for the likelihood of the observed features in a frame of speech given a particular phonetic context.  The features are typically related to measurements of the spectral characteristics of a time-slice of speech.  While individual recipes for training acoustic models vary in their structure and sequencing, the basic process involves aligning transcribed speech to states within an existing acoustic model, accumulating frames associated with each state, and re-estimating the probability distributions associated with the state, given the features observed in those frames.
  • 11. Grammar  A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.  Grammars define the domain, or context, within which the recognition engine works.  The engine compares the current utterance against the words and phrases in the active grammars.  If the user says something that is not in the grammar, the speech engine will not be able to decipher it correctly.  You can define a single grammar for your application, or you may have multiple grammars.  Chances are, you will have multiple grammars, and you will activate each grammar only when it is needed.
  • 12. Recognized Text  It is an output of this model that produces a likelihood if what user has provided as input commonly known as recognized text.  This text can be processed in two way by the application using this model:  As a command  As a dictation text  As a command, a further processing is made upon the text which serves as output  As a dictation text, a stream of text is shown as output as a formatted text just like a text can be shown on screen.
  • 13. Performance Measurement  The performance of a speech recognition system is measurable.  Perhaps the most widely used measurement is accuracy.  Arguably the most important measurement of accuracy is whether the desired end result occurred.  This measurement is useful in validating application design.  For example, if the user said "yes," the engine returned "yes," and the "YES" action was executed, it is clear that the desired end result was achieved.  But what happens if the engine returns text that does not exactly match the utterance?  For example, what if the user said "nope," the engine returned "no," yet the "NO" action was executed?
  • 14. Standard  There are a lot of standards by World Wide Web Consortium (W3C)  Some of them are enlisted below:  VoiceXML  SRGS (Speech Recognition Grammar Specification)  SISR (Semantic Interpretation for Speech Recognition)  SSML (Speech Synthesis Markup Language)  PLS (Pronunciation Lexicon Specification)  CCXML (Call Control eXtensible Markup Language)  MediaCTRL  MSML (Media Server Markup Language)  MSCML (Media Server Control Markup Language)
  • 15. Google search by voice: A case study  Mobile web search is a rapidly growing area of interest.  Internet-enabled Smart phones account for an increasing share of the mobile devices sold throughout the world, and most models offer a web browsing experience that rivals desktop computers in display quality.  Users are increasingly turning to their mobile devices when doing web searches, driving efforts to enhance the usability of web search on these devices.  Although mobile device usability has improved, typing search queries can still be cumbersome, error-prone, and even dangerous in some usage scenarios.
  • 16. GSbV – A screenshot
  • 17. GSbV – An example Figure 3. An example of a speech recognizer app GSbV
  • 18. Conclusions  Speech recognition will revolutionize the way people conduct business over the Web and will, ultimately, differentiate world-class e-businesses.  VoiceXML ties speech recognition and telephony together and provides the technology with which businesses can develop and deploy voice-enabled Web solutions TODAY!  These solutions can greatly expand the accessibility of Web-based self-service transactions to customers who would otherwise not have access, and, at the same time, leverage a business’ existing Web investments.  Speech recognition and VoiceXML clearly represent the next wave of the Web.