Speech recognition converts spoken words to text. The term "speech recognition" is used to refer to recognition systems that must be trained to any speaker—as is the case for most desktop recognition software.
2. Introduction
Block Diagram
Linguistic Levels Of Analysis
Phonetics
Organs Of Speech And Articulation
Acoustic Model
Circuit Diagram
Components Used
Features Of HM2007
Working
Extracting Phonemes In Frequency Domain
Markov Model
Advantages
Applications
Conclusion
3. Analyses sound and converts spoken word into text
Uses knowledge of spoken English
Programs are available for voice recognition.
Systems work best on Windows XP & Windows Vista
4. Computers
Databases Algorithms
Robotics Natural Language Processing Search
Information
Retrieval
Machine
Translation
Language
Analysis
Semantics
5. Speech
Written language
Phonology: sounds / letters / pronunciation
Morphology: the structure of words
Syntax: how these sequences are structured
Semantics: meaning of the strings
6. The Study of the way Humans make, Transmit, and
receive sounds
Phonology - the study of sound systems of languages
A typical word such as moon broken down into
three phonemes: m, ue , n.
Phoneme represents all vowels and consonants of
spoken speech
7.
8. Most vowel sounds are modified by the shape of
the lips (rounded / spread / neutral)
Sounds are made by vibrating the vocal cords
(voicing)
Vowels can be :-
Single sounds – Monophthongs or pure vowels
Double sounds - Diphthongs
Triple sounds - Triphthongs
Pure vowels usually come in pairs consisting of
long and short sounds
9. This is found in the word tea. The lips are spread and the sound is long.
This is found in the word hip. The lips are slightly spread and the sound is short.
The tongue tip is raised slightly at the front towards the alveolar. In the longer sound the
tongue is raised higher.
10. This sound is made by relaxing the mouth and
keeping your lips in a neutral position and making
a short sound. It is found in words like paper,
over, about, and common in weak verbs in spoken
English.
11. The long sound – you, too & blue
The short sound –Good, would &
wool
The lips are rounded and the centre
and back of the tongue is raised towards
the soft plate. For the longer sound the
tongue is raised higher and the lips are
more rounded.
This sound is made with the mouth
spread wide open. It is found in – cat,
man, apple & ran
12. Here we have three sounds: The sounds from -
1) for 2) tour 3) go
Triphthongs are combinations of three sounds-
English has 1 triphthong (a diphthong + a
schwa sound)
Diphthongs are combinations of two sounds.
13. Diphthongs are combinations of pure vowels.
•a:+ I = ‘aI’ - tie, buy, height & night
•e + I = ‘eI’ -way, paid & gate
•o: + I = ‘oI’ – boy, coin & coy
•e + = e - where, hair & care
• I + = I - here, hear & beer
e e
e e
14. The audio recording of speech to create a
statistical representation of sound.
To create a speech recognition engine, a large
database of models is created to match each
phoneme
These database models have stored
phonemes
The language model has the grammar of the
sentence to decode our spoken word to text.
17. A single chip voice recognition system
having 48 pin .
Manufactured by Hualon
Maximum 40 word and word length 1.92 sec
Microphone support
5V power supply
18.
19. How a computer convert spoken speech into data ??
When we speak, a microphone converts the analog signal of our voice into
digital chunks of data that the computer analyzes.
It is from this data that the computer extracts enough information that
confidently guess the word being spoken
20. To extract phonemes
Phonemes are linguistic units
The sounds that group together form words
Phoneme converts into sound & depends on many factors
21. aa - father
ae - cat
ah - cut
ao - dog
aw - foul
ng - sing
t - talk
th - thin
uh - book
waveform shows
phonemes freq
characteristics
22. Phonemes are extracted by running waveform through Fourier
transform
Easily visible in frequency domain
This can be make out by seeing spectrograph
Spectrograph is a 3-D plot of waveform freq and amplitude
versus time and amplitude is shown in grey colour
23. Computer generates list of phoneme
These phoneme have to be converted into words and to
sentence so Markov model is used
It compares the observed phoneme with the stored phoneme
24. In this, word tomato is written both in English and American
English format
This idea is used upto the level of sentences and improved
recognition
25. It is used to translate different form of language
It Is used in telephones
The std land line telephone has a bandwidth of 64kb/s.
Sampling rate of 8khz
In Std desktop P.C ,the limiting factor is sound card.It can
record sampling rate between 16 kHz to 48 kHz
26. MILITARY
HELICOPTERS
IN MOBILE SMARTPHONES`
SPEECH CONTROLLED
APPLIANCES
VOICE RECOGNITION SECURITY
27. Speech recognition system is one of the latest technology .
Ir reduces costs like that of training
Steps :
Fourier transform of signal
Extraction of Phonemes
Formation of word on the basis of Markov Models
Charm of Simplicity
With the advent of this technology, we will hopefully see a
new era of human computer interaction .
28. From: Chapter 1 of An Introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition, by Daniel Jurafsky and James H. Martin
http://en.wikipedia.org/wiki/acoustic model
http://en.wikipedia.org/wiki/speech recognition
www.wikpedia.org
www.slideshare.net
Natural Language Processing by Rada Mihalcea
www.youtube.com