SlideShare a Scribd company logo
1 of 26
Unit 6 Speech Signal
DR MINAKSHI PRADEEP ATRE
PVG’S COET & GKPIM PUNE
References
Book: Speech and Audio Processing by Dr Shaila Apte madam
Pdf document: http://cs.haifa.ac.il/~nimrod/Compression/Speech/S1Basics2010.pdf
For speech samples:
https://www.signalogic.com/index.pl?page=speech_codec_wav_samples
Contents
Speech:
1. Basics of speech signal and its features
2. LTI representation of speech signal
3. LTV representation of speech signal
4. Estimation of fundamental frequency
5. identification of voiced and unvoiced speech
6. and noise removal
Speech
Speech signal is generated by nature
Naturally occurring so random in nature
Necessary to understand the generalized human speech production
Simple linear time invariant (LTI) model for speech production
Inherently time varying nature of speech
Introduction to linear time variant (LTV) model of speech
Speech type: consonants, fricatives
Voiced and unvoiced (V/UV) speech
Speech Production Mechanism: Pipelines
Model
Vocal Tract
Vocal Tract
 Vocal tract is the cavity between the vocal cords and the
lips, and acts as a resonator that spectrally shapes the
periodic input, much like the cavity of a musical wind
instrument. ƒ
Simple model of a steady-state vowel regards the vocal
tract as a linear time-invariant (LTI) filter with a periodic
impulse-like input.
What is Speech signal?
 Created at the Vocal cords, travels through the Vocal tract, and
produced at speakers mouth
 Gets to the listeners ear as a pressure wave
 Non-Stationary, but can be divided to sound segments which have
some common acoustic properties for a short time interval
 Two Major classes: Phonemes (Vowels and Consonants)
Phonemes
The basic sounds of a language (e.g. "a" in the word "father“) are
called phonemes
A typical speech utterance consists of a string of vowel and
consonant phonemes whose temporal and spectral characteristics
change with time
In addition, the time-varying source and system can also
nonlinearly interact in a complex way: our simple model is correct for
a steady vowel, but the sounds of speech are not always well
represented by linear time-invariant systems !
Vowel Production
In vowel production, air is forced from the lungs by contraction of
the muscles around the lung cavity
Air flows through the vocal cords, which are two masses of flesh,
causing periodic vibration of the cords whose rate gives the pitch of
the sound
Resulting periodic puffs of air act as an excitation input, or source,
to the vocal tract
Typical Vowels
Speech Production
A sound source excites a (vocal tract) filter
◦ Voiced: Periodic source, created by vocal cords
◦ Unvoiced: Aperiodic and noisy source
Pitch is the fundamental frequency of the vocal cords vibration (also called F0) followed by 4-5
Formants (F1 - F5) at higher frequencies
Natural frequencies occur at
odd multiples of 500 Hz.
These resonant frequencies
are called formants.
Vowel Adult Male Adult Female
F1 F2 F3 F1 F2 F3
(i) 255 2330 3000 340 2610 3210
(u) 290 940 2180 390 995 2585
(ae) 735 1625 2465 950 1955 2900
Typical formant frequencies for selected vowels in Hz
This table shows
the three values
LTI Model for speech production
Impulse Train
Generator
(Glottis)
Random Signal
Generator
Impulse Response
of Vocal Tract
Generated Speech
Impulse train generator is
used as an excitation signal
when a voiced segment is
produced VOWEL
e.g. “a”
Basic Assumption: source of excitation and
the vocal tract systems are independent
Periodic
LTI Model for speech production
Impulse Train
Generator
(Glottis)
Random Signal
Generator
Impulse Response
of Vocal Tract
Generated Speech
Random Signal Generator is
used as an excitation signal
when an unvoiced segment
is produced
CONSONANTS
e.g. “s”
LTI model is used for a short segment of
speech @10 ms for which we can assume the
parameters of vocal tract remain constant
Random
Nature of Speech Signal
 Speech is generated by components like vocal cords and vocal tracts
 It’s not possible to generate a speech signal on its own
Speech is random signal
 Speech has/ can have infinite features (story of an elephant and the blind people touching the
elephant to identify and specify what the elephant looks like)
So it’s a complex problem
 Uttering the different words is possible because of humans can change the resonant modes of
the vocal cavity and can also stretch the vocal cords to some extent for modifying the pitch
period for different vowels
And that’s why we have the linear time-varying (LTV) model
Linear Time-varying Model: Speech
production
Impulse Train
Generator
Random Signal
Generator
Impulse Response
of Vocal Tract
Generated Speech
Amplitude
Pitch period is
variable
Impulse response is
variable
Speech Sound Categories
Periodic (Sonorants, Voiced)
Noisy (Fricatives , Un-Voiced)
Impulsive (Plosive)
Example:
In the word “shop,” the “sh,” “o,” and “p” are generated from a
noisy, periodic, and impulsive source, respectively
Frequency Range
Speech:
Pitch frequency:
◦ male ~ 85-155 Hz;
◦ female ~ 165-255 Hz;
Singer’s vocal range: from bass to
soprano: 80 Hz-1100 Hz
Pitch
Pitch period: The time duration of one glottal cycle
Pitch (fundamental frequency): The reciprocal of the pitch period.
Remember: we will
calculate the pitch
for voiced segment
Pitch Detection
The pitch period and V/UV
decisions are elementary
to many speech coders
Many methods for the
calculation:
◦ Autocorrelation function
◦ ZCR
Features or categorization of speech
sound
Speech sounds are studied and classified from the following
perspectives:
1) The nature of the source: periodic, noisy, or impulsive, and
combinations of the three
2) The shape of the vocal tract
3) The time-domain waveform, which gives the pressure change with
time at the lips output
4) The time-varying spectral characteristics revealed through the
spectrogram
Spectrogram
Time-varying spectral characteristics of the speech signal can be graphically
displayed through the use of a tow-dimensional pattern
Vertical axis: frequency, Horizontal axis: time
The pseudo-color of the (red: high energy ) pattern is proportional to signal
energy
The resonance frequencies of the vocal tract show up as “energy bands”
Voiced intervals characterized by striated appearance (periodically of the
signal)
Un-Voiced intervals are more solidly filled in
Yellow are formants
Most common Manner of articulation
Plosive, or oral stop, where there is complete occlusion (blockage) of both the oral and nasal
cavities of the vocal tract, and therefore no air flow. Examples include English /p t k/ (voiceless)
and /b d g/ (voiced)
Nasal stop, where there is complete occlusion of the oral cavity, and the air passes instead
through the nose. The shape and position of the tongue determine the resonant cavity that
gives different nasal stops their characteristic sounds. Examples include English /m, n/
Fricative, sometimes called spirant, where there is continuous frication (turbulent and noisy
airflow) at the place of articulation. Examples include English /f, s/ (voiceless), /v, z/ (voiced), etc
Most common Manner of articulation
Sibilants are a type of fricative where the airflow is guided by a groove in the tongue toward the
teeth, creating a high-pitched and very distinctive sound. These are by far the most common
fricatives. English sibilants include /s/ and /z
Affricate, which begins like a plosive, but this releases into a fricative rather than having a
separate release of its own. The English letters "ch" and "j" represent affricates
Trill, in which the articulator (usually the tip of the tongue) is held in place, and the airstream
causes it to vibrate. The double "r" of Spanish "perro" is a trill.
Approximant, where there is very little obstruction. Examples include English /w/ and /r/. Lateral
approximants, usually shortened to lateral, are a type of approximant pronounced with the side
of the tongue. English /l/ is a lateral.
Time for MATLAB Program
THANK YOU

More Related Content

What's hot

Physiology of Language and Speech
Physiology of Language and SpeechPhysiology of Language and Speech
Physiology of Language and SpeechABHILASHA MISHRA
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speechhariom gour
 
Csd 210 anatomy & physiology of the speech mechanism ii
Csd 210 anatomy & physiology of the speech mechanism iiCsd 210 anatomy & physiology of the speech mechanism ii
Csd 210 anatomy & physiology of the speech mechanism iiJake Probst
 
How to do with subtitling?
How to do with subtitling? How to do with subtitling?
How to do with subtitling? Paulina Malicka
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
Acoustic phonetics
Acoustic phoneticsAcoustic phonetics
Acoustic phoneticsJunaid Amjed
 
Introduction to phonetics
Introduction to phoneticsIntroduction to phonetics
Introduction to phoneticsAmir Alm Eldin
 
Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...
Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...
Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...IRJET Journal
 
English phonetics redouane boulguid ensa_safi_morocco
English phonetics redouane boulguid ensa_safi_moroccoEnglish phonetics redouane boulguid ensa_safi_morocco
English phonetics redouane boulguid ensa_safi_moroccoRednef68 Rednef68
 
Acoustics of Speech: The Voice Mechanism
Acoustics of Speech: The Voice MechanismAcoustics of Speech: The Voice Mechanism
Acoustics of Speech: The Voice MechanismFarhat Surve
 
Presentation language and the brain
Presentation language and the brainPresentation language and the brain
Presentation language and the brainAhmad Murtaqi
 
Laryngeal dystonia introduction
Laryngeal dystonia introductionLaryngeal dystonia introduction
Laryngeal dystonia introductionMd Roohia
 
HIS 120 Air Flow and the Speech Mechanism
HIS 120 Air Flow and the Speech MechanismHIS 120 Air Flow and the Speech Mechanism
HIS 120 Air Flow and the Speech MechanismRebecca Krouse
 
Audiovisual translation (avt)
Audiovisual translation (avt)Audiovisual translation (avt)
Audiovisual translation (avt)Mileyvi Paredes
 

What's hot (20)

Physiology of Language and Speech
Physiology of Language and SpeechPhysiology of Language and Speech
Physiology of Language and Speech
 
Speech
SpeechSpeech
Speech
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Csd 210 anatomy & physiology of the speech mechanism ii
Csd 210 anatomy & physiology of the speech mechanism iiCsd 210 anatomy & physiology of the speech mechanism ii
Csd 210 anatomy & physiology of the speech mechanism ii
 
physiology of speech
physiology of speechphysiology of speech
physiology of speech
 
How to do with subtitling?
How to do with subtitling? How to do with subtitling?
How to do with subtitling?
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Acoustic phonetics
Acoustic phoneticsAcoustic phonetics
Acoustic phonetics
 
Introduction to phonetics
Introduction to phoneticsIntroduction to phonetics
Introduction to phonetics
 
Amity NLP Notes
Amity NLP NotesAmity NLP Notes
Amity NLP Notes
 
Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...
Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...
Investigations of Formant Extraction of Male and Female Speech Signal Via Cep...
 
Speech disorders
Speech disordersSpeech disorders
Speech disorders
 
English phonetics redouane boulguid ensa_safi_morocco
English phonetics redouane boulguid ensa_safi_moroccoEnglish phonetics redouane boulguid ensa_safi_morocco
English phonetics redouane boulguid ensa_safi_morocco
 
Acoustics of Speech: The Voice Mechanism
Acoustics of Speech: The Voice MechanismAcoustics of Speech: The Voice Mechanism
Acoustics of Speech: The Voice Mechanism
 
Subtitling (English)
Subtitling (English)Subtitling (English)
Subtitling (English)
 
Presentation language and the brain
Presentation language and the brainPresentation language and the brain
Presentation language and the brain
 
Laryngeal dystonia introduction
Laryngeal dystonia introductionLaryngeal dystonia introduction
Laryngeal dystonia introduction
 
HIS 120 Air Flow and the Speech Mechanism
HIS 120 Air Flow and the Speech MechanismHIS 120 Air Flow and the Speech Mechanism
HIS 120 Air Flow and the Speech Mechanism
 
ppt on phonology
 ppt on phonology ppt on phonology
ppt on phonology
 
Audiovisual translation (avt)
Audiovisual translation (avt)Audiovisual translation (avt)
Audiovisual translation (avt)
 

Similar to Part1 speech basics

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
SodaBottles-licensing Copyright-Fix.pdf
SodaBottles-licensing Copyright-Fix.pdfSodaBottles-licensing Copyright-Fix.pdf
SodaBottles-licensing Copyright-Fix.pdfNga Trinh
 
Phonetics & Phonology Mine.pptx
Phonetics & Phonology Mine.pptxPhonetics & Phonology Mine.pptx
Phonetics & Phonology Mine.pptxKoukabKhan
 
Phoneticsphonology lecture 2
Phoneticsphonology  lecture 2Phoneticsphonology  lecture 2
Phoneticsphonology lecture 2Raj Wali Khan
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Phonetic and phonology pp2
Phonetic and phonology pp2Phonetic and phonology pp2
Phonetic and phonology pp2zhian fadhil
 
Phonetics ( Introduction to Linguistics )
Phonetics ( Introduction to Linguistics )Phonetics ( Introduction to Linguistics )
Phonetics ( Introduction to Linguistics )Romulo Mulianto
 
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epgClass 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epgLisa Lavoie
 
Cube model Theory of acoustic phonetics
Cube model Theory of acoustic phonetics Cube model Theory of acoustic phonetics
Cube model Theory of acoustic phonetics KarloHammer
 
Acoustic phonetics
Acoustic phoneticsAcoustic phonetics
Acoustic phoneticsVivaAs
 
1 ESO Música - El so
1 ESO Música - El so1 ESO Música - El so
1 ESO Música - El soJoan Sèculi
 
Speech organ and manner of articulation
Speech organ and manner of articulationSpeech organ and manner of articulation
Speech organ and manner of articulationYanti95
 

Similar to Part1 speech basics (20)

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Phonetics
PhoneticsPhonetics
Phonetics
 
Linguistics
LinguisticsLinguistics
Linguistics
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
English Mystery 2
English Mystery 2English Mystery 2
English Mystery 2
 
Phonetics
PhoneticsPhonetics
Phonetics
 
4455355.ppt
4455355.ppt4455355.ppt
4455355.ppt
 
SodaBottles-licensing Copyright-Fix.pdf
SodaBottles-licensing Copyright-Fix.pdfSodaBottles-licensing Copyright-Fix.pdf
SodaBottles-licensing Copyright-Fix.pdf
 
Phonetics & Phonology Mine.pptx
Phonetics & Phonology Mine.pptxPhonetics & Phonology Mine.pptx
Phonetics & Phonology Mine.pptx
 
Phoneticsphonology lecture 2
Phoneticsphonology  lecture 2Phoneticsphonology  lecture 2
Phoneticsphonology lecture 2
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Phonetic and phonology pp2
Phonetic and phonology pp2Phonetic and phonology pp2
Phonetic and phonology pp2
 
Phonetics ( Introduction to Linguistics )
Phonetics ( Introduction to Linguistics )Phonetics ( Introduction to Linguistics )
Phonetics ( Introduction to Linguistics )
 
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epgClass 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
 
Cube model Theory of acoustic phonetics
Cube model Theory of acoustic phonetics Cube model Theory of acoustic phonetics
Cube model Theory of acoustic phonetics
 
Acoustic phonetics
Acoustic phoneticsAcoustic phonetics
Acoustic phonetics
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Class 4
Class 4Class 4
Class 4
 
1 ESO Música - El so
1 ESO Música - El so1 ESO Música - El so
1 ESO Música - El so
 
Speech organ and manner of articulation
Speech organ and manner of articulationSpeech organ and manner of articulation
Speech organ and manner of articulation
 

More from Minakshi Atre

Signals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to FundamentalsSignals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to FundamentalsMinakshi Atre
 
Unit 4 Statistical Learning Methods: EM algorithm
Unit 4 Statistical Learning Methods: EM algorithmUnit 4 Statistical Learning Methods: EM algorithm
Unit 4 Statistical Learning Methods: EM algorithmMinakshi Atre
 
Inference in HMM and Bayesian Models
Inference in HMM and Bayesian ModelsInference in HMM and Bayesian Models
Inference in HMM and Bayesian ModelsMinakshi Atre
 
Artificial Intelligence: Basic Terminologies
Artificial Intelligence: Basic TerminologiesArtificial Intelligence: Basic Terminologies
Artificial Intelligence: Basic TerminologiesMinakshi Atre
 
2)local search algorithms
2)local search algorithms2)local search algorithms
2)local search algorithmsMinakshi Atre
 
Performance appraisal/ assessment in higher educational institutes (HEI)
Performance appraisal/ assessment in higher educational institutes (HEI)Performance appraisal/ assessment in higher educational institutes (HEI)
Performance appraisal/ assessment in higher educational institutes (HEI)Minakshi Atre
 
Artificial intelligence agents and environment
Artificial intelligence agents and environmentArtificial intelligence agents and environment
Artificial intelligence agents and environmentMinakshi Atre
 
Unit 6: DSP applications
Unit 6: DSP applications Unit 6: DSP applications
Unit 6: DSP applications Minakshi Atre
 
Unit 6: DSP applications
Unit 6: DSP applicationsUnit 6: DSP applications
Unit 6: DSP applicationsMinakshi Atre
 
Learning occam razor
Learning occam razorLearning occam razor
Learning occam razorMinakshi Atre
 
Waltz algorithm in artificial intelligence
Waltz algorithm in artificial intelligenceWaltz algorithm in artificial intelligence
Waltz algorithm in artificial intelligenceMinakshi Atre
 
Perception in artificial intelligence
Perception in artificial intelligencePerception in artificial intelligence
Perception in artificial intelligenceMinakshi Atre
 
Popular search algorithms
Popular search algorithmsPopular search algorithms
Popular search algorithmsMinakshi Atre
 
Artificial Intelligence Terminologies
Artificial Intelligence TerminologiesArtificial Intelligence Terminologies
Artificial Intelligence TerminologiesMinakshi Atre
 
composite video signal
composite video signalcomposite video signal
composite video signalMinakshi Atre
 
Basic terminologies of television
Basic terminologies of televisionBasic terminologies of television
Basic terminologies of televisionMinakshi Atre
 

More from Minakshi Atre (20)

Signals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to FundamentalsSignals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to Fundamentals
 
Unit 4 Statistical Learning Methods: EM algorithm
Unit 4 Statistical Learning Methods: EM algorithmUnit 4 Statistical Learning Methods: EM algorithm
Unit 4 Statistical Learning Methods: EM algorithm
 
Inference in HMM and Bayesian Models
Inference in HMM and Bayesian ModelsInference in HMM and Bayesian Models
Inference in HMM and Bayesian Models
 
Artificial Intelligence: Basic Terminologies
Artificial Intelligence: Basic TerminologiesArtificial Intelligence: Basic Terminologies
Artificial Intelligence: Basic Terminologies
 
2)local search algorithms
2)local search algorithms2)local search algorithms
2)local search algorithms
 
Performance appraisal/ assessment in higher educational institutes (HEI)
Performance appraisal/ assessment in higher educational institutes (HEI)Performance appraisal/ assessment in higher educational institutes (HEI)
Performance appraisal/ assessment in higher educational institutes (HEI)
 
DSP preliminaries
DSP preliminariesDSP preliminaries
DSP preliminaries
 
Artificial intelligence agents and environment
Artificial intelligence agents and environmentArtificial intelligence agents and environment
Artificial intelligence agents and environment
 
Unit 6: DSP applications
Unit 6: DSP applications Unit 6: DSP applications
Unit 6: DSP applications
 
Unit 6: DSP applications
Unit 6: DSP applicationsUnit 6: DSP applications
Unit 6: DSP applications
 
Learning occam razor
Learning occam razorLearning occam razor
Learning occam razor
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
Waltz algorithm in artificial intelligence
Waltz algorithm in artificial intelligenceWaltz algorithm in artificial intelligence
Waltz algorithm in artificial intelligence
 
Perception in artificial intelligence
Perception in artificial intelligencePerception in artificial intelligence
Perception in artificial intelligence
 
Popular search algorithms
Popular search algorithmsPopular search algorithms
Popular search algorithms
 
Artificial Intelligence Terminologies
Artificial Intelligence TerminologiesArtificial Intelligence Terminologies
Artificial Intelligence Terminologies
 
composite video signal
composite video signalcomposite video signal
composite video signal
 
Basic terminologies of television
Basic terminologies of televisionBasic terminologies of television
Basic terminologies of television
 
Mpeg 2
Mpeg 2Mpeg 2
Mpeg 2
 
Beginning of dtv
Beginning of dtvBeginning of dtv
Beginning of dtv
 

Recently uploaded

11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsapna80328
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdfAkritiPradhan2
 

Recently uploaded (20)

11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveying
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
 

Part1 speech basics

  • 1. Unit 6 Speech Signal DR MINAKSHI PRADEEP ATRE PVG’S COET & GKPIM PUNE
  • 2. References Book: Speech and Audio Processing by Dr Shaila Apte madam Pdf document: http://cs.haifa.ac.il/~nimrod/Compression/Speech/S1Basics2010.pdf For speech samples: https://www.signalogic.com/index.pl?page=speech_codec_wav_samples
  • 3. Contents Speech: 1. Basics of speech signal and its features 2. LTI representation of speech signal 3. LTV representation of speech signal 4. Estimation of fundamental frequency 5. identification of voiced and unvoiced speech 6. and noise removal
  • 4. Speech Speech signal is generated by nature Naturally occurring so random in nature Necessary to understand the generalized human speech production Simple linear time invariant (LTI) model for speech production Inherently time varying nature of speech Introduction to linear time variant (LTV) model of speech Speech type: consonants, fricatives Voiced and unvoiced (V/UV) speech
  • 5. Speech Production Mechanism: Pipelines Model Vocal Tract
  • 6. Vocal Tract  Vocal tract is the cavity between the vocal cords and the lips, and acts as a resonator that spectrally shapes the periodic input, much like the cavity of a musical wind instrument. ƒ Simple model of a steady-state vowel regards the vocal tract as a linear time-invariant (LTI) filter with a periodic impulse-like input.
  • 7. What is Speech signal?  Created at the Vocal cords, travels through the Vocal tract, and produced at speakers mouth  Gets to the listeners ear as a pressure wave  Non-Stationary, but can be divided to sound segments which have some common acoustic properties for a short time interval  Two Major classes: Phonemes (Vowels and Consonants)
  • 8. Phonemes The basic sounds of a language (e.g. "a" in the word "father“) are called phonemes A typical speech utterance consists of a string of vowel and consonant phonemes whose temporal and spectral characteristics change with time In addition, the time-varying source and system can also nonlinearly interact in a complex way: our simple model is correct for a steady vowel, but the sounds of speech are not always well represented by linear time-invariant systems !
  • 9. Vowel Production In vowel production, air is forced from the lungs by contraction of the muscles around the lung cavity Air flows through the vocal cords, which are two masses of flesh, causing periodic vibration of the cords whose rate gives the pitch of the sound Resulting periodic puffs of air act as an excitation input, or source, to the vocal tract
  • 11. Speech Production A sound source excites a (vocal tract) filter ◦ Voiced: Periodic source, created by vocal cords ◦ Unvoiced: Aperiodic and noisy source Pitch is the fundamental frequency of the vocal cords vibration (also called F0) followed by 4-5 Formants (F1 - F5) at higher frequencies Natural frequencies occur at odd multiples of 500 Hz. These resonant frequencies are called formants. Vowel Adult Male Adult Female F1 F2 F3 F1 F2 F3 (i) 255 2330 3000 340 2610 3210 (u) 290 940 2180 390 995 2585 (ae) 735 1625 2465 950 1955 2900 Typical formant frequencies for selected vowels in Hz This table shows the three values
  • 12. LTI Model for speech production Impulse Train Generator (Glottis) Random Signal Generator Impulse Response of Vocal Tract Generated Speech Impulse train generator is used as an excitation signal when a voiced segment is produced VOWEL e.g. “a” Basic Assumption: source of excitation and the vocal tract systems are independent Periodic
  • 13. LTI Model for speech production Impulse Train Generator (Glottis) Random Signal Generator Impulse Response of Vocal Tract Generated Speech Random Signal Generator is used as an excitation signal when an unvoiced segment is produced CONSONANTS e.g. “s” LTI model is used for a short segment of speech @10 ms for which we can assume the parameters of vocal tract remain constant Random
  • 14. Nature of Speech Signal  Speech is generated by components like vocal cords and vocal tracts  It’s not possible to generate a speech signal on its own Speech is random signal  Speech has/ can have infinite features (story of an elephant and the blind people touching the elephant to identify and specify what the elephant looks like) So it’s a complex problem  Uttering the different words is possible because of humans can change the resonant modes of the vocal cavity and can also stretch the vocal cords to some extent for modifying the pitch period for different vowels And that’s why we have the linear time-varying (LTV) model
  • 15. Linear Time-varying Model: Speech production Impulse Train Generator Random Signal Generator Impulse Response of Vocal Tract Generated Speech Amplitude Pitch period is variable Impulse response is variable
  • 16. Speech Sound Categories Periodic (Sonorants, Voiced) Noisy (Fricatives , Un-Voiced) Impulsive (Plosive) Example: In the word “shop,” the “sh,” “o,” and “p” are generated from a noisy, periodic, and impulsive source, respectively
  • 17. Frequency Range Speech: Pitch frequency: ◦ male ~ 85-155 Hz; ◦ female ~ 165-255 Hz; Singer’s vocal range: from bass to soprano: 80 Hz-1100 Hz
  • 18. Pitch Pitch period: The time duration of one glottal cycle Pitch (fundamental frequency): The reciprocal of the pitch period. Remember: we will calculate the pitch for voiced segment
  • 19. Pitch Detection The pitch period and V/UV decisions are elementary to many speech coders Many methods for the calculation: ◦ Autocorrelation function ◦ ZCR
  • 20. Features or categorization of speech sound Speech sounds are studied and classified from the following perspectives: 1) The nature of the source: periodic, noisy, or impulsive, and combinations of the three 2) The shape of the vocal tract 3) The time-domain waveform, which gives the pressure change with time at the lips output 4) The time-varying spectral characteristics revealed through the spectrogram
  • 21. Spectrogram Time-varying spectral characteristics of the speech signal can be graphically displayed through the use of a tow-dimensional pattern Vertical axis: frequency, Horizontal axis: time The pseudo-color of the (red: high energy ) pattern is proportional to signal energy The resonance frequencies of the vocal tract show up as “energy bands” Voiced intervals characterized by striated appearance (periodically of the signal) Un-Voiced intervals are more solidly filled in
  • 23. Most common Manner of articulation Plosive, or oral stop, where there is complete occlusion (blockage) of both the oral and nasal cavities of the vocal tract, and therefore no air flow. Examples include English /p t k/ (voiceless) and /b d g/ (voiced) Nasal stop, where there is complete occlusion of the oral cavity, and the air passes instead through the nose. The shape and position of the tongue determine the resonant cavity that gives different nasal stops their characteristic sounds. Examples include English /m, n/ Fricative, sometimes called spirant, where there is continuous frication (turbulent and noisy airflow) at the place of articulation. Examples include English /f, s/ (voiceless), /v, z/ (voiced), etc
  • 24. Most common Manner of articulation Sibilants are a type of fricative where the airflow is guided by a groove in the tongue toward the teeth, creating a high-pitched and very distinctive sound. These are by far the most common fricatives. English sibilants include /s/ and /z Affricate, which begins like a plosive, but this releases into a fricative rather than having a separate release of its own. The English letters "ch" and "j" represent affricates Trill, in which the articulator (usually the tip of the tongue) is held in place, and the airstream causes it to vibrate. The double "r" of Spanish "perro" is a trill. Approximant, where there is very little obstruction. Examples include English /w/ and /r/. Lateral approximants, usually shortened to lateral, are a type of approximant pronounced with the side of the tongue. English /l/ is a lateral.
  • 25. Time for MATLAB Program