SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speech Processing
Govind
Center for Computational Engineering & Networking
Amrita Vishwa Vidyapeetham
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Outline
Introduction
Human Speech Production and Perception Systems
Representation of Speech in the Time and Frequency
Domains
Speech Sounds and Features
Signal Processing Methods for Estimating Speech
Features
Speech Processing Applications
Speech Recognition
Speech Synthesis
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Prior Knowledge Required:
Signals and Systems
Digital signal Processing
Advanced DSP
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Signals and Systems
Classification of Signals
LTI systems
Correlation/Convolution Operations
Fourier Representation: FS, DTFS, DTFT,DFT,FFT,
Z-transform
Concepts of Impulse Response, Frequency Response etc.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Digital signal Processing
Sampling: Nyquist, Aliasing
FFT implementation of DFT
Design of FIR and IIR filters
Structures for realization of Filters
Multirate signal processing: Filter banks
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Advanced DSP
Time-Frequency Analysis
TFA by STFT
TFA by wigner Distribututions
TFA by Wavelets
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
References
L. Rabiner, Biing-Hwang Juang and B.
Yegnanarayana,"Fundamentals of Speech
Recognition",Pearson Education Inc.2009
Douglas O’Shaughnessy,"Speech
Communication",University Press,2001
Thomas F Quatieri,"Discrete Time Speech Signal
Processing", Pearson Education Inc.,2004
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Introduction
Information in Speech
Message
Language
Accent
Speaker
Emotions/Stress
Applications
Recognition
Speech recognition
Speaker Recognition/Verification
Emotion Recognition etc..
Synthesis
Text to Speech Synthesis
Speech Enhancement
Voice Conversion
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Applications:Recognition
Speech Objective Information Extracted
Message Author of the danger...
Speaker Its Govind Speaking
Speaker claim has to
be verified
Hi Govind, your claim is ac-
cepted
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Applications:Synthesis
Input Objective Output
Text To Speech Synthesis
Text (Epochs Occur... Synthesize Text
Speech Enhancement
Remove noise
Remove reverberation
Enhance desired
speaker speech
Voice Conversion
Convert source
speaker speech target
speakr speech
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
What makes automatic processing of speech
Complicated?
Its an inter-disciplinary area
1 Signal Processing: The process of extracting relevant information from
speech signal
2 Physics: The science of understanding relationship between physical
speech signal and physiological mechanisms that produced it.
3 Pattern Recognition: Grouping or classifying patterns of various events
in speech
4 Communication and information theory: Deals with efficient way of
encodng or decoding parameters of speech, efficient serach for patterns of
interest in speech (dynamic programming, viterbi search, stack algorithms
etc..)
5 Linguistics: The relationship between sounds (phonology) with syntax
and semantics of a language and sense that derived from the meaning
(pragmatics)
6 Computer Science: The study of diferent algorithms for implementing in
Software/Hardware
7 Psychology: Understanding the psychological state of the
speaker/listener will be helpful for the tasks like emotion analysis.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speaker-Listener Schematic Diagram in Speech
Communication
Figure: Schematic Diagram of Speech Communication: Figure
Courtesy- Rabiner et al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Production-Perception Block Diagram
DĞƐƐĂŐĞ
&ŽƌŵƵůĂƚŝŽŶ
>ĂŶŐƵĂŐĞ
ŽĚĞ
EĞƵƌŽͲ
DƵƐĐƵůĂƌ
ŽŶƚƌŽůƐ
sŽĐĂů dƌĂĐƚ
^LJƐƚĞŵ
ĐŽƵƐƚŝĐ
tĂǀĞĨŽƌŵ
dƌĂŶƐŵŝƐƐŝŽŶ
ŚĂŶŶĞů
ĐŽƵƐƚŝĐ
tĂǀĞĨŽƌŵ
DĞƐƐĂŐĞ
hŶĚĞƌƐƚĂŶĚŝŶŐ
>ĂŶŐƵĂŐĞ
dƌĂŶƐůĂƚŝŽŶ
EĞƵƌĂů
dƌĂŶƐĚƵĐƚŝŽŶ
ĂƐŝůĂƌ
DĞŵďƌĂŶĞ
DŽƚŝŽŶ
dĞdžƚ WŚŽŶĞŵĞƐͲ
WƌŽƐŽĚLJ
ƌƚŝĐƵůĂƚŽƌLJ
DŽƚŝŽŶ
^ĞŵĂŶƚŝĐƐ
WŚŽŶĞŵĞƐ
tŽƌĚƐ
^ĞŶƚĞŶĐĞƐ
&ĞĂƚƵƌĞ
džƚƌĂĐƚŝŽŶ
ŽĚŝŶŐ
^ƉĞĐƚƌƵŵ
ŶĂůLJƐŝƐ
Figure: Speech production BlockDiagram: Figure Courtesy- Rabiner
et al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speech Production
Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri,
"Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Mechanical Equivalent of Speech Production System
Figure: Speech production mechanism: Figure Courtesy- Rabiner et
al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Representation of Speech Signal
0 0.5 1 1.5 2 2.5
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure: Speech Signal in Time domain
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Glottal Air Flow During Speech Production
Figure: Glottal air flow: Courtesy- Rabinar et al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Glottal Air Flow: Graphical Illustration
1.3 1.35 1.4 1.45 1.5 1.55
x 10
4
−1
−0.5
0
0.5
Time (Samples)
Amplitude
Speech Waveform
1.3 1.35 1.4 1.45 1.5 1.55
x 10
4
−1
−0.5
0
0.5
Time (Samples)
Amplitude
Glottal Flow: EGG
Speech EGG
Glottis
Vibration
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of Speech Sounds
Silence (S): No Speech is produced
Unvoiced (U): Vocal folds are not vibrating
Voiced (V): Periodic vibration of vocal cords
0 0.5 1 1.5 2 2.5
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
US S
V
V
V
Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of Speech Sounds
Separation of voiced sounds from unvoiced and silence
sounds is known as voiced-non-voiced detection
Issues in voiced-non-voiced detection:
Difficult to identify weak unvoiced sound from silence
Difficult to distinguish weakly periodic voiced sounds from
unvoiced sounds
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
SpectroGrams: Narrow-band & Wide-band
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Spectral Envelope from a Long Segment of Speech
0
10
20
30
0
1000
2000
3000
4000
0
20
40
FrameIndex
Frequency (Hz)
Magnitude
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of sound units
WŚŽŶĞŵĞƐ
sŽǁĞůƐ
ĨĨƌŝĐĂƚĞ
Ɛ
ŝƉŚƚŚŽŶŐƐ
^ĞŵŝͲ sŽǁĞůƐ
>ŝƋƵŝĚƐ 'ůŝĚĞƐ
ŽŶƐŽŶĂŶƚ
Ɛ
EĂƐĂůƐ
WůŽƐŝǀĞƐ
&ƌŝĐĂƚŝǀĞƐ tŚŝƐƉĞƌƐ
&ƌŽŶƚ DŝĚ ĂĐŬ
sŽŝĐĞĚ hŶǀŽŝĐĞĚ
ŝ ;ĞǀĞͿ
/ ;ŝƚͿ
Ğ ;ŚĂƚĞͿ
;ŵĞƚͿ
h;ŬͿ
Ƶ;ƚͿ
;ƵƉͿ
Ă ;ĨĂƚŚĞƌͿ
Ž;KďĞLJͿ
Đ; ůůͿ
ĂLJ ;ďƵLJͿ
Ăǁ;ĚŽǁŶͿ
ĞLJ ;ďĂŝƚͿ
K ;ďŽLJͿ
ƚnj ;ƐƉŽƌƚƐͿ
ũŚ;ũƵĚŐĞͿ
ĐŚ ;ĐŚƵƌĐŚͿ
ů ;ůĂƌŐĞͿ
ƌ;ƌƵŶͿ
ǁ ;ǁŝƚͿ
LJ ;LJŽƵͿ
ŵ ;ŵĞƚͿ
Ŷ;ŶĞƚͿ
ŶŐ;ƐŝŶŐͿ
Ś ;ŚĞͿ
ď ;ďĂůůͿ
Ě ;ĚĞďƚͿ
Ő ;ŐĞƚͿ
Ŭ ;ŬŝƚͿ
Ɖ ;ƉĞŶͿ
ƚ;ƚĞŶͿ
sŽŝĐĞĚ hŶǀŽŝĐĞĚ
ǀ ;ǀĂƚͿ
ĚŚ;ƚŚĂƚͿ
nj;njŽŽͿ
Ĩ ;ĨƵŶͿ
ƚŚ ;ƚŚŝŶŐͿ
Ɛ;ƐĂƚͿ
ƐŚ;ƐŚŽƵůĚͿ
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Representation of sound units in speech
Sounds are classified into vowels and consonant
Vowels: By exciting fixed vocaltract shape with quasi
periodic glottal pulses
Vowels are classified into front, mid and back based on the
tongue-hump-position
Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate")
Mid vowels: /a/("father"), /Λ/("Up")
Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey")
Another classification is based on the length of vowels:
Long and short
Diphthongs: Combination of two vowels
/ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in
"boat",/cy/ as in "boy" etc.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Front Vowel
Front
Vowel
Speech Signal Spectrogram
I(It)
0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
1000
2000
3000
4000
5000
6000
7000
e(Hate)
0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
1000
2000
3000
4000
5000
6000
7000
i(eve)
0.32 0.34 0.36 0.38 0.4 0.42
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Vowel Analysis
Front vowels found to show high frequency resonance
Front vowels are discriminated among each other by the
tongue height during the vowel production
Mid vowels found to show well separated and balanced
resonant frequency distribution
Back vowels shows almost no energy beyond low
frequency regions
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Diphthongs
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Semivowels
Group of sounds consisting of /w/,/r/,/l/,/y/
difficult to characterize because they are vowel like in
nature
Characterized by gliding transition in vocaltract area
functions between adjacent phonemes
Best described as transitional vowel like sounds
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Nasal Consonants
Group of sounds consisting of /m/,/n/,/η/
Produced with glottal Excitation and vocaltract totally
constricted along the oral passageway
Velam is lowered to block the air passage through oral
cavity and allowing through nasal cavity
Due the acoustic coupling of oral cavity to the pharynx, anti
resonances will be created
/m/,/n/ and /η/ are produced by the constiction at lips,
behind the teeth and at velum, respectively.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Nasalized Vowels
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Unvoiced Fricatives
Produced by exciting vocaltract with a turbulant airflow
through a narrow constriction
/f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the
class of fricative sounds
/f/: Constriction at teeth
/s/: Constriction near middle of oral cavity
/sh/: constriction at the end of oral tract
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Voiced Fricatives
/v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class
of fricative sounds
/v/: Constriction at teeth
/z/: Constriction near middle of oral cavity
/zh/: constriction at the end of oral tract
Except glottal vibrations, the place of articulation remains
same as that of unvoiced fricatives
Govind CEN, Amrita Vishwa Vidyapeetham

Mais conteúdo relacionado

Semelhante a SPEECH PROCESSING PREREQUISITES

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentationsamyakbhuta
 
Speech recognition and digital image processing.pptx
Speech recognition and digital image processing.pptxSpeech recognition and digital image processing.pptx
Speech recognition and digital image processing.pptxRakeshR458516
 
Enterprise Voice Technology Solutions: A Primer
Enterprise Voice Technology Solutions: A PrimerEnterprise Voice Technology Solutions: A Primer
Enterprise Voice Technology Solutions: A PrimerCognizant
 
Effective Presentation: Multimedia
Effective Presentation: MultimediaEffective Presentation: Multimedia
Effective Presentation: MultimediaAlaa Sadik
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
 
Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...csandit
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...cscpconf
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITIONDeep Learning JP
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdf
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdfSpeechbird AI Review – Unleashing the Power of Speech Recognition.pdf
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdfAMB-Review
 
Automated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test MeasurementAutomated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test Measurementguest5a90cfc
 

Semelhante a SPEECH PROCESSING PREREQUISITES (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Dy36749754
Dy36749754Dy36749754
Dy36749754
 
Speech recognition and digital image processing.pptx
Speech recognition and digital image processing.pptxSpeech recognition and digital image processing.pptx
Speech recognition and digital image processing.pptx
 
Enterprise Voice Technology Solutions: A Primer
Enterprise Voice Technology Solutions: A PrimerEnterprise Voice Technology Solutions: A Primer
Enterprise Voice Technology Solutions: A Primer
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Effective Presentation: Multimedia
Effective Presentation: MultimediaEffective Presentation: Multimedia
Effective Presentation: Multimedia
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voiceover sevices
Voiceover sevicesVoiceover sevices
Voiceover sevices
 
Instructional Design - Unit 2
Instructional Design - Unit 2Instructional Design - Unit 2
Instructional Design - Unit 2
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
 
Seminar
SeminarSeminar
Seminar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Video conferencing
Video conferencingVideo conferencing
Video conferencing
 
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdf
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdfSpeechbird AI Review – Unleashing the Power of Speech Recognition.pdf
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdf
 
Automated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test MeasurementAutomated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test Measurement
 

Último

FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 

Último (20)

FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 

SPEECH PROCESSING PREREQUISITES

  • 1. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speech Processing Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham Govind CEN, Amrita Vishwa Vidyapeetham
  • 2. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Outline Introduction Human Speech Production and Perception Systems Representation of Speech in the Time and Frequency Domains Speech Sounds and Features Signal Processing Methods for Estimating Speech Features Speech Processing Applications Speech Recognition Speech Synthesis Govind CEN, Amrita Vishwa Vidyapeetham
  • 3. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Prior Knowledge Required: Signals and Systems Digital signal Processing Advanced DSP Govind CEN, Amrita Vishwa Vidyapeetham
  • 4. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Signals and Systems Classification of Signals LTI systems Correlation/Convolution Operations Fourier Representation: FS, DTFS, DTFT,DFT,FFT, Z-transform Concepts of Impulse Response, Frequency Response etc. Govind CEN, Amrita Vishwa Vidyapeetham
  • 5. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Digital signal Processing Sampling: Nyquist, Aliasing FFT implementation of DFT Design of FIR and IIR filters Structures for realization of Filters Multirate signal processing: Filter banks Govind CEN, Amrita Vishwa Vidyapeetham
  • 6. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Advanced DSP Time-Frequency Analysis TFA by STFT TFA by wigner Distribututions TFA by Wavelets Govind CEN, Amrita Vishwa Vidyapeetham
  • 7. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP References L. Rabiner, Biing-Hwang Juang and B. Yegnanarayana,"Fundamentals of Speech Recognition",Pearson Education Inc.2009 Douglas O’Shaughnessy,"Speech Communication",University Press,2001 Thomas F Quatieri,"Discrete Time Speech Signal Processing", Pearson Education Inc.,2004 Govind CEN, Amrita Vishwa Vidyapeetham
  • 8. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Introduction Information in Speech Message Language Accent Speaker Emotions/Stress Applications Recognition Speech recognition Speaker Recognition/Verification Emotion Recognition etc.. Synthesis Text to Speech Synthesis Speech Enhancement Voice Conversion Govind CEN, Amrita Vishwa Vidyapeetham
  • 9. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Applications:Recognition Speech Objective Information Extracted Message Author of the danger... Speaker Its Govind Speaking Speaker claim has to be verified Hi Govind, your claim is ac- cepted Govind CEN, Amrita Vishwa Vidyapeetham
  • 10. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Applications:Synthesis Input Objective Output Text To Speech Synthesis Text (Epochs Occur... Synthesize Text Speech Enhancement Remove noise Remove reverberation Enhance desired speaker speech Voice Conversion Convert source speaker speech target speakr speech Govind CEN, Amrita Vishwa Vidyapeetham
  • 11. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals What makes automatic processing of speech Complicated? Its an inter-disciplinary area 1 Signal Processing: The process of extracting relevant information from speech signal 2 Physics: The science of understanding relationship between physical speech signal and physiological mechanisms that produced it. 3 Pattern Recognition: Grouping or classifying patterns of various events in speech 4 Communication and information theory: Deals with efficient way of encodng or decoding parameters of speech, efficient serach for patterns of interest in speech (dynamic programming, viterbi search, stack algorithms etc..) 5 Linguistics: The relationship between sounds (phonology) with syntax and semantics of a language and sense that derived from the meaning (pragmatics) 6 Computer Science: The study of diferent algorithms for implementing in Software/Hardware 7 Psychology: Understanding the psychological state of the speaker/listener will be helpful for the tasks like emotion analysis. Govind CEN, Amrita Vishwa Vidyapeetham
  • 12. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speaker-Listener Schematic Diagram in Speech Communication Figure: Schematic Diagram of Speech Communication: Figure Courtesy- Rabiner et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 13. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Production-Perception Block Diagram DĞƐƐĂŐĞ &ŽƌŵƵůĂƚŝŽŶ >ĂŶŐƵĂŐĞ ŽĚĞ EĞƵƌŽͲ DƵƐĐƵůĂƌ ŽŶƚƌŽůƐ sŽĐĂů dƌĂĐƚ ^LJƐƚĞŵ ĐŽƵƐƚŝĐ tĂǀĞĨŽƌŵ dƌĂŶƐŵŝƐƐŝŽŶ ŚĂŶŶĞů ĐŽƵƐƚŝĐ tĂǀĞĨŽƌŵ DĞƐƐĂŐĞ hŶĚĞƌƐƚĂŶĚŝŶŐ >ĂŶŐƵĂŐĞ dƌĂŶƐůĂƚŝŽŶ EĞƵƌĂů dƌĂŶƐĚƵĐƚŝŽŶ ĂƐŝůĂƌ DĞŵďƌĂŶĞ DŽƚŝŽŶ dĞdžƚ WŚŽŶĞŵĞƐͲ WƌŽƐŽĚLJ ƌƚŝĐƵůĂƚŽƌLJ DŽƚŝŽŶ ^ĞŵĂŶƚŝĐƐ WŚŽŶĞŵĞƐ tŽƌĚƐ ^ĞŶƚĞŶĐĞƐ &ĞĂƚƵƌĞ džƚƌĂĐƚŝŽŶ ŽĚŝŶŐ ^ƉĞĐƚƌƵŵ ŶĂůLJƐŝƐ Figure: Speech production BlockDiagram: Figure Courtesy- Rabiner et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 14. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speech Production Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri, "Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi Govind CEN, Amrita Vishwa Vidyapeetham
  • 15. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Mechanical Equivalent of Speech Production System Figure: Speech production mechanism: Figure Courtesy- Rabiner et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 16. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Representation of Speech Signal 0 0.5 1 1.5 2 2.5 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure: Speech Signal in Time domain Govind CEN, Amrita Vishwa Vidyapeetham
  • 17. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Glottal Air Flow During Speech Production Figure: Glottal air flow: Courtesy- Rabinar et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 18. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Glottal Air Flow: Graphical Illustration 1.3 1.35 1.4 1.45 1.5 1.55 x 10 4 −1 −0.5 0 0.5 Time (Samples) Amplitude Speech Waveform 1.3 1.35 1.4 1.45 1.5 1.55 x 10 4 −1 −0.5 0 0.5 Time (Samples) Amplitude Glottal Flow: EGG Speech EGG Glottis Vibration Govind CEN, Amrita Vishwa Vidyapeetham
  • 19. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Classification of Speech Sounds Silence (S): No Speech is produced Unvoiced (U): Vocal folds are not vibrating Voiced (V): Periodic vibration of vocal cords 0 0.5 1 1.5 2 2.5 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 US S V V V Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham
  • 20. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Classification of Speech Sounds Separation of voiced sounds from unvoiced and silence sounds is known as voiced-non-voiced detection Issues in voiced-non-voiced detection: Difficult to identify weak unvoiced sound from silence Difficult to distinguish weakly periodic voiced sounds from unvoiced sounds Govind CEN, Amrita Vishwa Vidyapeetham
  • 21. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes SpectroGrams: Narrow-band & Wide-band Govind CEN, Amrita Vishwa Vidyapeetham
  • 22. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Spectral Envelope from a Long Segment of Speech 0 10 20 30 0 1000 2000 3000 4000 0 20 40 FrameIndex Frequency (Hz) Magnitude Govind CEN, Amrita Vishwa Vidyapeetham
  • 23. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Classification of sound units WŚŽŶĞŵĞƐ sŽǁĞůƐ ĨĨƌŝĐĂƚĞ Ɛ ŝƉŚƚŚŽŶŐƐ ^ĞŵŝͲ sŽǁĞůƐ >ŝƋƵŝĚƐ 'ůŝĚĞƐ ŽŶƐŽŶĂŶƚ Ɛ EĂƐĂůƐ WůŽƐŝǀĞƐ &ƌŝĐĂƚŝǀĞƐ tŚŝƐƉĞƌƐ &ƌŽŶƚ DŝĚ ĂĐŬ sŽŝĐĞĚ hŶǀŽŝĐĞĚ ŝ ;ĞǀĞͿ / ;ŝƚͿ Ğ ;ŚĂƚĞͿ ;ŵĞƚͿ h;ŬͿ Ƶ;ƚͿ ;ƵƉͿ Ă ;ĨĂƚŚĞƌͿ Ž;KďĞLJͿ Đ; ůůͿ ĂLJ ;ďƵLJͿ Ăǁ;ĚŽǁŶͿ ĞLJ ;ďĂŝƚͿ K ;ďŽLJͿ ƚnj ;ƐƉŽƌƚƐͿ ũŚ;ũƵĚŐĞͿ ĐŚ ;ĐŚƵƌĐŚͿ ů ;ůĂƌŐĞͿ ƌ;ƌƵŶͿ ǁ ;ǁŝƚͿ LJ ;LJŽƵͿ ŵ ;ŵĞƚͿ Ŷ;ŶĞƚͿ ŶŐ;ƐŝŶŐͿ Ś ;ŚĞͿ ď ;ďĂůůͿ Ě ;ĚĞďƚͿ Ő ;ŐĞƚͿ Ŭ ;ŬŝƚͿ Ɖ ;ƉĞŶͿ ƚ;ƚĞŶͿ sŽŝĐĞĚ hŶǀŽŝĐĞĚ ǀ ;ǀĂƚͿ ĚŚ;ƚŚĂƚͿ nj;njŽŽͿ Ĩ ;ĨƵŶͿ ƚŚ ;ƚŚŝŶŐͿ Ɛ;ƐĂƚͿ ƐŚ;ƐŚŽƵůĚͿ Govind CEN, Amrita Vishwa Vidyapeetham
  • 24. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Representation of sound units in speech Sounds are classified into vowels and consonant Vowels: By exciting fixed vocaltract shape with quasi periodic glottal pulses Vowels are classified into front, mid and back based on the tongue-hump-position Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate") Mid vowels: /a/("father"), /Λ/("Up") Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey") Another classification is based on the length of vowels: Long and short Diphthongs: Combination of two vowels /ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in "boat",/cy/ as in "boy" etc. Govind CEN, Amrita Vishwa Vidyapeetham
  • 25. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Front Vowel Front Vowel Speech Signal Spectrogram I(It) 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1000 2000 3000 4000 5000 6000 7000 e(Hate) 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 1000 2000 3000 4000 5000 6000 7000 i(eve) 0.32 0.34 0.36 0.38 0.4 0.42 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Govind CEN, Amrita Vishwa Vidyapeetham
  • 26. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Vowel Analysis Front vowels found to show high frequency resonance Front vowels are discriminated among each other by the tongue height during the vowel production Mid vowels found to show well separated and balanced resonant frequency distribution Back vowels shows almost no energy beyond low frequency regions Govind CEN, Amrita Vishwa Vidyapeetham
  • 27. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Diphthongs Govind CEN, Amrita Vishwa Vidyapeetham
  • 28. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Semivowels Group of sounds consisting of /w/,/r/,/l/,/y/ difficult to characterize because they are vowel like in nature Characterized by gliding transition in vocaltract area functions between adjacent phonemes Best described as transitional vowel like sounds Govind CEN, Amrita Vishwa Vidyapeetham
  • 29. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Nasal Consonants Group of sounds consisting of /m/,/n/,/η/ Produced with glottal Excitation and vocaltract totally constricted along the oral passageway Velam is lowered to block the air passage through oral cavity and allowing through nasal cavity Due the acoustic coupling of oral cavity to the pharynx, anti resonances will be created /m/,/n/ and /η/ are produced by the constiction at lips, behind the teeth and at velum, respectively. Govind CEN, Amrita Vishwa Vidyapeetham
  • 30. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Nasalized Vowels Govind CEN, Amrita Vishwa Vidyapeetham
  • 31. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Unvoiced Fricatives Produced by exciting vocaltract with a turbulant airflow through a narrow constriction /f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the class of fricative sounds /f/: Constriction at teeth /s/: Constriction near middle of oral cavity /sh/: constriction at the end of oral tract Govind CEN, Amrita Vishwa Vidyapeetham
  • 32. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Voiced Fricatives /v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class of fricative sounds /v/: Constriction at teeth /z/: Constriction near middle of oral cavity /zh/: constriction at the end of oral tract Except glottal vibrations, the place of articulation remains same as that of unvoiced fricatives Govind CEN, Amrita Vishwa Vidyapeetham