SlideShare uma empresa Scribd logo
1 de 86
Speech Technology-
Basics
Presenter: Eshwari.G
What is DSP?
• Digital signal processing is the processing of signals in a digital
form.
SIGNAL
Continuous signals x(t)
A description of how one parameter varies with
another parameter
Discrete signals x[n]
DIGITAL SIGNAL
DIGITAL signals x[n]
Discrete signals x[n]
Analog-to-digital conversion is an electronic process in which a
continuously variable (analog) signal is changed, without
altering its essential content, into a multi-level (digital) signal.
The input to an analog-to-digital converter (ADC) consists of a
voltage that varies among a theoretically infinite number of
values.
Examples are sine waves, the waveforms representing human
speech etc.
The output of the ADC, in contrast, has defined levels or states.
The simplest digital signals have only two states, and are called
binary.
ANALOG TO DIGITAL CONVERSION
Advantages of digital signals
• First, digital signals can be stored easily.
• Second, digital signals can be reproduced exactly.
All you have to do is be sure that a zero doesn't
get turned into a one or vice versa.
• Third, digital signals can be manipulated easily.
Since the signal is just a sequence of zeros and
ones, and since a computer can do anything
specifiable to such a sequence, you can do a great
many things with digital signals. And what you
are doing is called digital signal processing.
BASIC STRUCTURE OF A DIGITAL SIGNAL PROCESSING
SYSTEM
Pre-
amplifier
Final-
amplifier
Analog-Digital
Converter
Digital- Analog
Converter
Software
(Algorithm)
Digital
Signal
Processor
001101
101010
010110
110101
A/D D/A
digitized
signal
processed
digital
signal
ANALOG
input
signal
amplified
ANALOG
signal
processed
ANALOG
signal
ANALOG
output
signal
DIGITAL TO ANALOG CONVERSION
BASIC STRUCTURE OF A DIGITAL SIGNAL PROCESSING
SYSTEM
Pre-
amplifier
Final-
amplifier
Analog-Digital
Converter
Digital- Analog
Converter
Software
(Algorithm)
Digital
Signal
Processor
001101
101010
010110
110101
A/D D/A
digitized
signal
processed
digital
signal
ANALOG
input
signal
amplified
ANALOG
signal
processed
ANALOG
signal
ANALOG
output
signal
The process of combining signals is called
synthesis.
Decomposition is the inverse operation of
synthesis, where a single signal is broken into
two or more additive components.
Synthesis & Decomposition
2041×4 = ?
The number 2041 can be decomposed into:
2000+40+1
Each of these components can be multiplied by 4
Then synthesized to find the final answer
8000 + 160 + 4 = 8164
The goal of this method is to replace a complicated
problem with several easy ones.
Synthesis & Decomposition
• There are infinite possible decompositions for any
given signal, but only one synthesis
• For example, the numbers 15 and 25 can only be
synthesized (added) into the number 40
• In comparison, the number 40 can be decomposed
into:1+39, 2+38 & 30+10 etc.
Synthesis & Decomposition
Divide & conquer strategy
Signal being processed is broken into
single components
Each component is processed individually
Results are reunited
SUPERPOSITION
SUPERPOSITION
DECOMPOSITION
There are two main ways to
decompose signals in signal processing:
Impulse decomposition and
Fourier decomposition.
Impulse DECOMPOSITION
Impulse decomposition breaks an N
samples signal into N component signals,
each containing N samples.
Each of the component signals contains
one point from the original signal, with the
remainder of the values being zero.
A single nonzero point in a string of zeros
is called an impulse.
IMPORTANCE OF IMPULSE DECOMPOSITION
Impulse Decomposition
Impulse decomposition is important because it
allows signals to be examined one sample at a
time.
Similarly, systems are characterized by how
they respond to impulses.
By knowing how a system responds to an impulse,
the system's output can be calculated for any
given input. This approach is called convolution
Fourier Decomposition
Any N point signal can be
decomposed into N/2 signals,
half of them sine waves and half
of them cosine waves.
The lowest frequency cosine
wave (called in this xC0 [n]
illustration), makes zero complete
cycles over the N samples, i.e., it
is a DC signal.
Fourier Decomposition
The next cosine components: , ,
and , make 1, 2, xC1 [n] xC2 [n] xC3
[n] and 3 complete cycles over the
N samples, respectively.
Since the frequency of each
component is fixed, the only
thing that changes for different
signals being decomposed is the
amplitude of each of the sine and
cosine waves.
CONVOLUTION & FOURIER ANALYSISCONVOLUTION & FOURIER ANALYSIS
The two main techniques of signal processing:
Convolution and Fourier analysis.
Strategy
Decompose signals into simple additive components,
Process the components in some useful manner,
Synthesize the components into a final result.
This is DSP.
CONVOLUTIONCONVOLUTION
Convolution is a mathematical way of combining two
signals to form a third signal.
Using the strategy of impulse decomposition,
systems are described by a signal called the
impulse response.
Convolution relates the three signals of interest: the
input signal, the output signal, and the impulse
response.
Convolution provides the mathematical
framework for DSP
IMPULSE RESPONSEIMPULSE RESPONSE
The delta function is a
normalized impulse, that is,
sample number zero has a
value of one, while all other
samples have a value of
zero.
Delta function is frequently
called the unit impulse.
IMPULSE RESPONSEIMPULSE RESPONSE
Impulse response is the signal
that exits a system when a
delta function (unit impulse)
is the input.
If two systems are different in
any way, they will have
different impulse
responses.
Just as the input and
output signals are often
called x[n] y[n] and , the
impulse response is
usually given the name is
h[n]
IMPULSE RESPONSEIMPULSE RESPONSE
• Any impulse can be
represented as a shifted and
scaled delta function.
• Consider a signal, , composed
of all zeros except sample
number 8, a[n] which has a
value of -3.
• This is the same as a delta
function shifted to the right by 8
samples, and multiplied by -3.
• In equation form: a[n] = -3δ[n-8]
IMPULSE RESPONSEIMPULSE RESPONSE
 If the input to a system is
an impulse, such as , -3δ[n-
8] what is the system's
output?
 Scaling and shifting the
input results in an identical
scaling and shifting of the
output.
IMPULSE RESPONSEIMPULSE RESPONSE
 If -3δ[n-8] results in h[n] , it
follows that -3δ[n-8] results in
-3h[n-8] h[n]
 In words, the output is a
version of the impulse
response that has been
shifted and scaled by the
same amount as the delta
function on the input.
 If you know a system's
impulse response, you
immediately know how it will
react to any impulse.
How a system changes an input signal into
an output signal
 First, the input signal can be decomposed into a set of
impulses, each of which can be viewed as a scaled
and shifted delta function.
 Second, the output resulting from each impulse is a
scaled and shifted version of the impulse response.
 Third, the overall output signal can be found by adding
these scaled and shifted impulse responses.
 In other words, if we know a system's impulse
response, then we can calculate what the output will
be for any possible input signal.
• It is able to provide far better levels of signal processing
than is possible with analogue hardware alone.
• It is able to perform mathematical operations that enable
many of the spurious effects of the analogue components
to be overcome.
• In addition to this, it is possible to easily update a digital
signal processor by downloading new software.
• Once a basic DSP card has been developed, it is possible to
use this hardware design to operate in several different
environments, performing different functions, purely by
downloading different software.
• It is also able to provide functions that would not be
possible using analogue techniques.
Advantages over analogue processing
• It is not able to provide perfect filtering,
demodulation and other functions because of
mathematical limitations.
• In addition to this the processing power of the DSP
card may impose some processing limitations.
• It is also more expensive than many analogue
solutions, and thus it may not be cost effective in
some applications.
Limitations
SPEECH ANALYSIS
Extraction of properties or features from a speech
signal
Involves a transformation of s(n) into
another signal,
a set of signal
or a set of parameters
Objectives
Simplification
Data reduction
Signal
t
• Continuous Signal
(both parameters can assume
a continuous range of values)
Vertical Axis (y axis)– Amplitude
Horizontal Axis (x axis) – Time
The parameter on the y-axis
(the dependent variable)
is said to be a function of the
parameter on the x-axis
(the independent variable)
Speech Wave form
In this, the time axis is the horizontal axis from left to
right and the curve shows how the pressure increases and
decreases in the signal
Time domain representation.
Frequency domain (spectral)
f(ω)
Spectrum for a 1-ms
pulse
f(t)
Time domain vs Frequency domain
(Temporal) vs (Spectral)
Spectrum at
0.15 seconds
into the
utterance, in the
beginning of the
"o" vowel.
SHORT TIME ANALYSIS
 Short segments of speech signal are isolated
and processed as if they were short segments
from a sustained sound
 This is repeated as often as desired
 Each short segment is called an analysis frame
 Result – a single number or set of numbers
SHORT TIME ANALYSIS
• ASSUMPTION
 Properties of the speech signal change relatively
slowly with time
 This assumption leads to a variety of speech
processing methods
TYPES OF SHORT TIME ANALYSIS
 Short Time Energy (Average Magnitude)
 Short Time Average Zero crossing rate
 Short Time Auto-correlation
Short Time Energy
(Average Magnitude)
Amplitude of the speech signal varies appreciably with time
Amplitude of unvoiced segments is much lower than the
amplitude of voiced segments
Short time energy provides a convenient representation that
reflects these amplitude variations
Short Time Energy
(Average Magnitude)
50ms of a vowel
Squared version of (a)
Energy for a window length = 5 ms
Short Time Average Zero crossing rate
A zero crossing occurs when
s(n) = 0, for a continuous
signal
A zero crossing occurs if
successive samples have
different algebraic signs, for a
discrete signal
Short Time Average Zero crossing rate
For sinusoids F0 = ZCR/2
For speech signals
calculation of F0 from
ZCR is less precise
High ZCR – Unvoiced speech
Low ZCR – Voiced speech
Draw back – Highly sensitive to
noise.
ZCR is a simple measure of frequency content of the signal
t
Short Time Autocorrelation
Speech signal of s(n)
Fourier transform of s(n) = S(e jw
)
Energy spectrum = [S(e jw
) ]2
[S(e jw
)]2
is called Autocorrelation of s(n)
This preserves information about
harmonic and formant amplitudes in s(n)
Autocorrelation - Significance
Autocorrelation function contains the
energy
Period can be estimated by finding the
location of the first maximum in the auto
correlation function.
Auto correlation function contains much
more information about the detailed
structure of the signal.
Autocorrelation - Application
Applications
1. F0 estimation
2. Voiced /unvoiced determination
3. Linear prediction.
Cepstrum
DFTS(n)
LOG
MAGNITUDE
IDFT
S(ejω
) log|S(ejω
)|
Cepstrum was derived by reversing the first four letters of
"spectrum”
Cepstrum was introduced by Bogert, Healey and Tukey in 1963
for characterizing the seismic echoes resulting from
earthquakes
A cepstrum is the result of taking the Inverse Fourier transform
(IFT) of the log spectrum as if it were a signal.
Originally it was defined as ‘spectrum of spectrum’.
Operations on cepstra are labelled as quefrency analysis,
liftering, or cepstral analysis
Why Cepstrum?
• The cepstrum can be seen as information about rate of
change in the different spectrum bands.
• It has been used to determine the fundamental frequency
of human speech.
• Cepstrum pitch determination is particularly effective
because the effects of the vocal excitation (pitch) and
vocal tract (formants) are additive in the logarithm of the
power spectrum and thus clearly separate.
• The cepstrum is often used as a feature vector for
representing the human voice and musical signals.
Cepstral concepts - Quefrency
The independent variable of a cepstral graph is called the quefrency.
The quefrency is a measure of time, though not in the sense of a signal in the
time domain.
For example, if the sampling rate of an audio signal is 44100 Hz and there is a
large peak in the cepstrum whose quefrency is 100 samples, the peak indicates
the presence of a pitch that is 44100/100 = 441 Hz.
This peak occurs in the cepstrum because the harmonics in the spectrum are
periodic, and the period corresponds to the pitch.
Cepstral concepts - Rahmonics
• The x-axis of the cepstrum has units of quefrency, and
peaks in the cepstrum (which relate to periodicities in the
spectrum) are called rahmonics.
• To obtain an estimate of the fundamental frequency from
the cepstrum we look for a peak in the quefrency region
Cepstral concepts - Liftering
A filter that operates on a cepstrum might be called a lifter.
A low pass lifter is similar to a low pass filter in the frequency
domain.
It can be implemented by multiplying by a window in the
cepstral domain and when converted back to the time domain,
resulting in a smoother signal.
Cepstral Analysis
• Low quefrency components or samples
predominantly correspond to spectral
envelope. (Up to about 3 to 4 msec).
These are also called cepstral
coefficients.
• High quefrency components
predominantly correspond to periodic
excitation or source. (Beyond 4 msec)
• If signal is periodic, a strong peak is
seen over the high quefrency region at
T0, the pitch period.
• If signal is unvoiced, components are
distributed over all quefrencies.
The cepstral coefficients
• Cepstral coefficients can be derived both from the filter-
bank and linear predictive analyses.
• By keeping only the first few cepstral coefficients and
setting the remaining coefficients to zero, it is possible to
smooth the harmonic structure of the spectrum.
• Cepstral coefficients are therefore very convenient
coefficients to represent the speech spectral envelope.
• Cepstral coefficients have rather different dynamics, the
higher coefficients showing the smallest variances.
Cepstrum
Formant can be estimated by locating
the peaks in the log spectra
For voiced speech there is a peak in the
cepstrum
For unvoiced speech there is no such
peak in the cepstrum
Position of the peak is a good estimate
of the Pitch Period
Linear Predictive Coding
• Linear Predictive Coding (LPC) is
one of the most powerful speech
analysis techniques
• It is one of the most useful
methods for encoding good quality
speech at a low bit rate.
• It provides extremely accurate
estimates of speech parameters,
and is relatively efficient for
computation.
Linear Predictive Coding
Source-Excitation signal Transfer
Function
Speech
We can use the LPC coefficients to separate a
speech signal into two parts: the transfer function
(which contains the vocal quality-formants) and the
excitation (which contains the pitch and the
loudness)
• LPC analyzes the speech signal by
• estimating the formants,
• removing their effects from the speech
signal,
• and estimating the intensity and
frequency of the remaining buzz.
• The process of removing the formants is
called inverse filtering, and the remaining
signal is called the residue.
• The numbers which describe the formants and the residue can be stored or
transmitted somewhere else. LPC synthesizes the speech signal by reversing
the process: use the residue to create a source signal, use the formants to
create a filter (which represents the tube), and run the source through the
filter, resulting in speech.
• Because speech signals vary slowly with time, this process is done on short
chunks of the speech signal, which are called frames. Usually 30 to 50
frames per second give intelligible speech with good compression.
Basic Principle
A Speech sample can be approximated as a
linear combination of past speech samples
By minimizing the sum of the squared
differences between the actual speech
samples and the predicted ones, a unique
set of predicted codes can be determined
Linear Predictive Coding
Applications
1. F0 estimation
2. Pitch
3. Vocal tract area functions
4. For representing speech for low
bit transmission or storage
Linear Predictive Coding
Highlights
1. Extremely accurate estimation of
Speech Parameters
2. High speed of Computation
3. Robust, reliable & accurate
method
Linear Predictive Coding
Ways in which the basic models of analysis
and the associated parameters from them
are used in an integrated system
 Diagnostic Applications (CSL & VAGMI)
 Digital transmission of voice communication
 Non – Machine communication by voice
a. Voice Response systems
b. Speaker recognition systems
c. Speech recognition systems
Pre-emphasis
Before Pre-
emphasis
After Pre-
emphasis
Boost the amount of energy in the high frequencies.
For voiced segments like vowels, there is more energy at the lower
frequencies than at the higher frequencies - spectral tilt.
Boosting the high frequency energy makes information from these
higher formants more available to the acoustic model and improves
phone detection accuracy.
This pre-emphasis is done with a filter
Windowing
Goal of feature extraction is to provide spectral features.
Speech is a non-stationary signal, spectrum changes very
quickly if we extract spectral features from an entire
utterance or conversation.
Instead, we want to extract spectral features from a small
window of speech that characterizes a particular subphone
(its statistical properties are constant within this region).
Windowing determines the portion of the speech signal
that is to be analyzed by zeroing out the signal outside the
region of interest.
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
Windowing techniques
• Rectangular
• Bartlett
• Hamming
• Hanning
• Blackman
• Kaiser
The most commonly used are the
Rectangular and the Hamming methods
Bartlett Window
Rectangular Window
Hanning Window
Hamming Window
Kaiser Window
Blackman Window
DFT
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
Spectrum at
0.15 seconds
into the
utterance, in the
beginning of the
"o" vowel.
The Mel frequency
Human hearing is not equally sensitive at all frequency bands.
Modeling this property of human hearing during feature extraction improves
speaker recognition performance.
The form of the model used in MFCCs is to warp the frequencies output by
the DFT onto the mel scale.
A mel (Stevens et al, 1937; Stevens and Volkmann, 1940) is a unit of pitch.
Pairs of sounds that are perceptually equidistant in pitch are separated by an
equal number of mels.
The mapping between frequency in hz and the mel scale is linear below 1000
Hz and logarithmic above 1000 Hz.
The mel frequency can be computed from the raw acoustic frequency as
follows:
f
Mel(f) = 1127ln (1+ ------)
700
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
Mel filter Bank
During MFCC computation, we implement this intuition by creating a
bank of filters that collect energy from each frequency band, with 10
filters spaced linearly below 1000 Hz and the remaining filters spread
logarithmically above 1000 Hz .
Finally, we take the log of each of the mel spectrum values.
In general, the human response to signal level is logarithmic - humans
are less sensitive to slight differences in amplitude at high amplitudes
than at low amplitudes.
In addition, using a log makes the feature estimates less sensitive to
variations in input such as power variations due to the speaker’s mouth
moving closer or further from the microphone.
Log magnitude spectrum
Magnitude
spectrum
Log magnitude
spectrum
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
Replace each amplitude value in the magnitude
spectrum with its log
Visualize the log spectrum as if itself were a waveform
Cepstrum is the spectrum of the log of the spectrum.
By taking the spectrum of the log spectrum, we have left
the frequency domain of the spectrum and gone back to
the time domain
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
IDFT
There is a large peak around 120, corresponding to the Fo
There are other various components at lower values on the x-axis.
These represent the vocal tract filter (the position of the tongue and
the other articulators).
Thus, if we are interested in detecting phones, we can make use of
just the lower cepstral values.
If we are interested in detecting pitch, we can use the higher cepstral
values
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
Cepstrum
MFCC
12 co-efficients
For MFCC extraction, we generally just take the first 12
cepstral values.
These 12 coefficients will represent information solely about
the vocal tract filter, cleanly separated from information
about the glottal source.
It turns out that cepstral coefficients have the extremely
useful property that the variance of the different coefficients
tends to be uncorrelated.
This is not true for the spectrum, where spectral coefficients
at different frequency bands are correlated.
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
MFCC
The extraction of the cepstrum with the inverse DFT results in 12
cepstral coeffcients for each frame.
We next add a 13th
feature; the energy from the frame.
Energy correlates with phone identity and so is a useful cue for phone
detection (vowels and sibilants have more energy that stops, etc.).
The energy in a frame is the sum over time of the power of the
samples in the frame; thus, for a signal x in a window from time
sample t1 to time sample t1, the energy is
t2
Energy = ∑ x2
[t]
t=t1
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
Energy
Deltas
Speech signal is not constant from frame to frame.
This change, such as the slope of a formant at its transitions,
or the nature of the change from a stop closure to stop
burst, can provide a useful cue for phone identity.
For this reason, we also add features related to the change in
cepstral features over time.
We do this by adding for each of the 13 features (12 cepstral
features plus energy) a delta or velocity feature and a
double delta or acceleration feature.
Each of the 13 delta features represents the change between
frames in the corresponding cepstral energy feature, and
each of the13 double delta features represents the change
between frames in the corresponding delta features.
Pre
Emphasis
Window DFT Mel filter
Bank
log IDFT deltas
SPEECH SPECTROGRAPH
• A speech spectrograph is a laboratory instrument
that displays a graphical representation of the
amplitudes of the various component frequencies of
speech on a time based plot.
• A tool for analyzing vocal output.
• It is used for identifying the formants, and for real-
time biofeedback in voice training and therapy
SPEECH SPECTROGRAPH (Analog)
Speech Spectrograph (Digital)
Pre
Emphasis
Window DFT Plot
Amplitude vs.
Frequency
Plot
Spectro-
gram
Time
Pre
Emphasis
Window DFT Plot
Amplitude vs.
Frequency
Plot
Spectro-
gram
Time
SPEECH SPECTROGRAPH
• There are two main kinds of analysis performed by
the spectrograph, wideband (with a bandwidth of
300-500 Hz) and narrowband (with a bandwidth of
45-50 Hz).
WIDEBAND SPECTROGRAPH
• When used for normal speech
with a fundamental frequency of
around 100-200 Hz, will pick up
energy from several harmonics at
once and add them together.
• The Fo (fundamental frequency)
can be determined from the
graphic
• Also, the frequencies and relative
strengths of the first two formants
(F1 and F2) are visible as dark,
rather blurry concentrations of
energy.

Mais conteúdo relacionado

Mais procurados

Combating fading channels (1) (3)
Combating fading channels (1) (3)Combating fading channels (1) (3)
Combating fading channels (1) (3)liril sharma
 
Mobile Radio Propagation
Mobile Radio PropagationMobile Radio Propagation
Mobile Radio PropagationIzah Asmadi
 
Large scale path loss 1
Large scale path loss 1Large scale path loss 1
Large scale path loss 1Vrince Vimal
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniquesJAIGANESH SEKAR
 
Wireless Channels Capacity
Wireless Channels CapacityWireless Channels Capacity
Wireless Channels CapacityOka Danil
 
Companding & Pulse Code Modulation
Companding & Pulse Code ModulationCompanding & Pulse Code Modulation
Companding & Pulse Code ModulationYeshudas Muttu
 
Orthogonal frequency division multiplexing (ofdm)
Orthogonal frequency division multiplexing (ofdm)Orthogonal frequency division multiplexing (ofdm)
Orthogonal frequency division multiplexing (ofdm)Dilip Mathuria
 
Noise in Communication System
Noise in Communication SystemNoise in Communication System
Noise in Communication SystemIzah Asmadi
 
ASK, FSK, PSK Modulation Techniques in Detail
ASK, FSK, PSK Modulation Techniques in DetailASK, FSK, PSK Modulation Techniques in Detail
ASK, FSK, PSK Modulation Techniques in Detailnomanbarki
 
Channel capacity
Channel capacityChannel capacity
Channel capacityPALLAB DAS
 
Digital Modulation Techniques ppt
Digital Modulation Techniques pptDigital Modulation Techniques ppt
Digital Modulation Techniques pptPankaj Singh
 
Chapter10 switching
Chapter10 switchingChapter10 switching
Chapter10 switchingSuneel Varma
 

Mais procurados (20)

Fading Seminar
Fading SeminarFading Seminar
Fading Seminar
 
Ofdm for wireless
Ofdm for wirelessOfdm for wireless
Ofdm for wireless
 
Combating fading channels (1) (3)
Combating fading channels (1) (3)Combating fading channels (1) (3)
Combating fading channels (1) (3)
 
Mobile Radio Propagation
Mobile Radio PropagationMobile Radio Propagation
Mobile Radio Propagation
 
Large scale path loss 1
Large scale path loss 1Large scale path loss 1
Large scale path loss 1
 
Mobile Radio Propagations
Mobile Radio PropagationsMobile Radio Propagations
Mobile Radio Propagations
 
17 SONET/SDH
17 SONET/SDH17 SONET/SDH
17 SONET/SDH
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniques
 
Wireless Channels Capacity
Wireless Channels CapacityWireless Channels Capacity
Wireless Channels Capacity
 
Companding & Pulse Code Modulation
Companding & Pulse Code ModulationCompanding & Pulse Code Modulation
Companding & Pulse Code Modulation
 
Line Coding in OFC
Line Coding in OFCLine Coding in OFC
Line Coding in OFC
 
Orthogonal frequency division multiplexing (ofdm)
Orthogonal frequency division multiplexing (ofdm)Orthogonal frequency division multiplexing (ofdm)
Orthogonal frequency division multiplexing (ofdm)
 
Noise in Communication System
Noise in Communication SystemNoise in Communication System
Noise in Communication System
 
ASK, FSK, PSK Modulation Techniques in Detail
ASK, FSK, PSK Modulation Techniques in DetailASK, FSK, PSK Modulation Techniques in Detail
ASK, FSK, PSK Modulation Techniques in Detail
 
Multiplexing
MultiplexingMultiplexing
Multiplexing
 
Channel capacity
Channel capacityChannel capacity
Channel capacity
 
Digital Modulation Techniques ppt
Digital Modulation Techniques pptDigital Modulation Techniques ppt
Digital Modulation Techniques ppt
 
(Ofdm)
(Ofdm)(Ofdm)
(Ofdm)
 
Chapter10 switching
Chapter10 switchingChapter10 switching
Chapter10 switching
 
Multiplexing
MultiplexingMultiplexing
Multiplexing
 

Destaque

Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Aspects of connected speech1
Aspects of connected speech1Aspects of connected speech1
Aspects of connected speech1Imana amini
 
Assessment calms
Assessment   calmsAssessment   calms
Assessment calmsjsbartecchi
 
Articulatory dynamics in sttg
Articulatory dynamics in sttgArticulatory dynamics in sttg
Articulatory dynamics in sttgHemaraja Nayaka S
 
Physiological basis of fluency disorders
Physiological basis of fluency disordersPhysiological basis of fluency disorders
Physiological basis of fluency disordersHemaraja Nayaka S
 
Unit 1 Fluency, Disfluency, and Stuttering
Unit 1 Fluency, Disfluency, and StutteringUnit 1 Fluency, Disfluency, and Stuttering
Unit 1 Fluency, Disfluency, and Stutteringsahughes
 
Assessment stuttering predictioninstrument
Assessment   stuttering predictioninstrumentAssessment   stuttering predictioninstrument
Assessment stuttering predictioninstrumentjsbartecchi
 
Chapter 9 Fluency Assessment Ppt
Chapter 9 Fluency Assessment PptChapter 9 Fluency Assessment Ppt
Chapter 9 Fluency Assessment PptGayle Underwood
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceBlackboardEMEA
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overviewamr0mt
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Fluency vs. accuracy
Fluency vs. accuracyFluency vs. accuracy
Fluency vs. accuracyMarinazx
 

Destaque (20)

Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Management of articulation
Management of articulationManagement of articulation
Management of articulation
 
Aspects of connected speech1
Aspects of connected speech1Aspects of connected speech1
Aspects of connected speech1
 
Assessment calms
Assessment   calmsAssessment   calms
Assessment calms
 
Articulatory dynamics in sttg
Articulatory dynamics in sttgArticulatory dynamics in sttg
Articulatory dynamics in sttg
 
Rhythm of speech
Rhythm of speech Rhythm of speech
Rhythm of speech
 
Physiological basis of fluency disorders
Physiological basis of fluency disordersPhysiological basis of fluency disorders
Physiological basis of fluency disorders
 
1. fluency introduction
1. fluency introduction1. fluency introduction
1. fluency introduction
 
Unit 1 Fluency, Disfluency, and Stuttering
Unit 1 Fluency, Disfluency, and StutteringUnit 1 Fluency, Disfluency, and Stuttering
Unit 1 Fluency, Disfluency, and Stuttering
 
Assessment stuttering predictioninstrument
Assessment   stuttering predictioninstrumentAssessment   stuttering predictioninstrument
Assessment stuttering predictioninstrument
 
Stuttering
StutteringStuttering
Stuttering
 
Chapter 9 Fluency Assessment Ppt
Chapter 9 Fluency Assessment PptChapter 9 Fluency Assessment Ppt
Chapter 9 Fluency Assessment Ppt
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
 
Stuttering
StutteringStuttering
Stuttering
 
Repetition Presentation
Repetition PresentationRepetition Presentation
Repetition Presentation
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Accuracy vs fluency
Accuracy vs fluencyAccuracy vs fluency
Accuracy vs fluency
 
Stuttering
StutteringStuttering
Stuttering
 
Fluency vs. accuracy
Fluency vs. accuracyFluency vs. accuracy
Fluency vs. accuracy
 

Semelhante a Speech technology basics

Analog-to Digital Conversion
Analog-to Digital ConversionAnalog-to Digital Conversion
Analog-to Digital Conversionwajahat Gul
 
UPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptxUPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptxHarisMasood20
 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdfShivarkarSandip
 
Analog to digital conversion technique
Analog to digital conversion techniqueAnalog to digital conversion technique
Analog to digital conversion techniqueUmar Shuaib
 
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions ssuser2797e4
 
Chapter1 slide
Chapter1 slideChapter1 slide
Chapter1 slideasyrafjpk
 
Introduction to digital signal processing 2
Introduction to digital signal processing 2Introduction to digital signal processing 2
Introduction to digital signal processing 2Hossam Hassan
 
The application wavelet transform algorithm in testing adc effective number o...
The application wavelet transform algorithm in testing adc effective number o...The application wavelet transform algorithm in testing adc effective number o...
The application wavelet transform algorithm in testing adc effective number o...ijcsit
 
digital control Chapter1 slide
digital control Chapter1 slidedigital control Chapter1 slide
digital control Chapter1 slideasyrafjpk
 
Digital transmission & analog Digital to conversion
Digital transmission &  analog Digital to conversionDigital transmission &  analog Digital to conversion
Digital transmission & analog Digital to conversionChAwais15
 
Computer aided design of communication systems / Simulation Communication Sys...
Computer aided design of communication systems / Simulation Communication Sys...Computer aided design of communication systems / Simulation Communication Sys...
Computer aided design of communication systems / Simulation Communication Sys...Makan Mohammadi
 
Nt1330 Unit 4.2 Paper
Nt1330 Unit 4.2 PaperNt1330 Unit 4.2 Paper
Nt1330 Unit 4.2 PaperLisa Olive
 
SignalDecompositionTheory.pptx
SignalDecompositionTheory.pptxSignalDecompositionTheory.pptx
SignalDecompositionTheory.pptxPriyankaDarshana
 
Dss
Dss Dss
Dss nil65
 

Semelhante a Speech technology basics (20)

Analog-to Digital Conversion
Analog-to Digital ConversionAnalog-to Digital Conversion
Analog-to Digital Conversion
 
UPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptxUPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptx
 
Adc dac
Adc dacAdc dac
Adc dac
 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdf
 
Analog to digital conversion technique
Analog to digital conversion techniqueAnalog to digital conversion technique
Analog to digital conversion technique
 
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions
 
Chapter1 slide
Chapter1 slideChapter1 slide
Chapter1 slide
 
Introduction to digital signal processing 2
Introduction to digital signal processing 2Introduction to digital signal processing 2
Introduction to digital signal processing 2
 
The application wavelet transform algorithm in testing adc effective number o...
The application wavelet transform algorithm in testing adc effective number o...The application wavelet transform algorithm in testing adc effective number o...
The application wavelet transform algorithm in testing adc effective number o...
 
digital control Chapter1 slide
digital control Chapter1 slidedigital control Chapter1 slide
digital control Chapter1 slide
 
Digital transmission & analog Digital to conversion
Digital transmission &  analog Digital to conversionDigital transmission &  analog Digital to conversion
Digital transmission & analog Digital to conversion
 
Computer aided design of communication systems / Simulation Communication Sys...
Computer aided design of communication systems / Simulation Communication Sys...Computer aided design of communication systems / Simulation Communication Sys...
Computer aided design of communication systems / Simulation Communication Sys...
 
Nt1330 Unit 4.2 Paper
Nt1330 Unit 4.2 PaperNt1330 Unit 4.2 Paper
Nt1330 Unit 4.2 Paper
 
SignalDecompositionTheory.pptx
SignalDecompositionTheory.pptxSignalDecompositionTheory.pptx
SignalDecompositionTheory.pptx
 
Real time signal processing
Real time signal processingReal time signal processing
Real time signal processing
 
Lti system
Lti systemLti system
Lti system
 
Digitization
DigitizationDigitization
Digitization
 
Dss
Dss Dss
Dss
 
Dsp class 1
Dsp class 1Dsp class 1
Dsp class 1
 
Lec2
Lec2Lec2
Lec2
 

Mais de Hemaraja Nayaka S

Mais de Hemaraja Nayaka S (11)

Anatomy and Neurophysiology of Swallowing
Anatomy and Neurophysiology of SwallowingAnatomy and Neurophysiology of Swallowing
Anatomy and Neurophysiology of Swallowing
 
Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
surface dyslexia
surface dyslexia surface dyslexia
surface dyslexia
 
1. models of word recognition
1. models of word recognition1. models of word recognition
1. models of word recognition
 
stuttering & nnf
stuttering & nnfstuttering & nnf
stuttering & nnf
 
Linguistic n prosodic basis
Linguistic n prosodic basisLinguistic n prosodic basis
Linguistic n prosodic basis
 
1. fluency introduction
1. fluency introduction1. fluency introduction
1. fluency introduction
 
laryngeal dynamics in stuttering
 laryngeal  dynamics in stuttering laryngeal  dynamics in stuttering
laryngeal dynamics in stuttering
 
Disfluency types
Disfluency typesDisfluency types
Disfluency types
 
Neuroanatomy of language functions
Neuroanatomy of language functionsNeuroanatomy of language functions
Neuroanatomy of language functions
 

Último

call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 

Último (20)

call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 

Speech technology basics

  • 2.
  • 3. What is DSP? • Digital signal processing is the processing of signals in a digital form.
  • 4. SIGNAL Continuous signals x(t) A description of how one parameter varies with another parameter Discrete signals x[n]
  • 5. DIGITAL SIGNAL DIGITAL signals x[n] Discrete signals x[n]
  • 6.
  • 7.
  • 8. Analog-to-digital conversion is an electronic process in which a continuously variable (analog) signal is changed, without altering its essential content, into a multi-level (digital) signal. The input to an analog-to-digital converter (ADC) consists of a voltage that varies among a theoretically infinite number of values. Examples are sine waves, the waveforms representing human speech etc. The output of the ADC, in contrast, has defined levels or states. The simplest digital signals have only two states, and are called binary. ANALOG TO DIGITAL CONVERSION
  • 9. Advantages of digital signals • First, digital signals can be stored easily. • Second, digital signals can be reproduced exactly. All you have to do is be sure that a zero doesn't get turned into a one or vice versa. • Third, digital signals can be manipulated easily. Since the signal is just a sequence of zeros and ones, and since a computer can do anything specifiable to such a sequence, you can do a great many things with digital signals. And what you are doing is called digital signal processing.
  • 10. BASIC STRUCTURE OF A DIGITAL SIGNAL PROCESSING SYSTEM Pre- amplifier Final- amplifier Analog-Digital Converter Digital- Analog Converter Software (Algorithm) Digital Signal Processor 001101 101010 010110 110101 A/D D/A digitized signal processed digital signal ANALOG input signal amplified ANALOG signal processed ANALOG signal ANALOG output signal
  • 11. DIGITAL TO ANALOG CONVERSION
  • 12. BASIC STRUCTURE OF A DIGITAL SIGNAL PROCESSING SYSTEM Pre- amplifier Final- amplifier Analog-Digital Converter Digital- Analog Converter Software (Algorithm) Digital Signal Processor 001101 101010 010110 110101 A/D D/A digitized signal processed digital signal ANALOG input signal amplified ANALOG signal processed ANALOG signal ANALOG output signal
  • 13. The process of combining signals is called synthesis. Decomposition is the inverse operation of synthesis, where a single signal is broken into two or more additive components. Synthesis & Decomposition
  • 14. 2041×4 = ? The number 2041 can be decomposed into: 2000+40+1 Each of these components can be multiplied by 4 Then synthesized to find the final answer 8000 + 160 + 4 = 8164 The goal of this method is to replace a complicated problem with several easy ones. Synthesis & Decomposition
  • 15. • There are infinite possible decompositions for any given signal, but only one synthesis • For example, the numbers 15 and 25 can only be synthesized (added) into the number 40 • In comparison, the number 40 can be decomposed into:1+39, 2+38 & 30+10 etc. Synthesis & Decomposition
  • 16. Divide & conquer strategy Signal being processed is broken into single components Each component is processed individually Results are reunited SUPERPOSITION
  • 18. DECOMPOSITION There are two main ways to decompose signals in signal processing: Impulse decomposition and Fourier decomposition.
  • 19. Impulse DECOMPOSITION Impulse decomposition breaks an N samples signal into N component signals, each containing N samples. Each of the component signals contains one point from the original signal, with the remainder of the values being zero. A single nonzero point in a string of zeros is called an impulse.
  • 20. IMPORTANCE OF IMPULSE DECOMPOSITION Impulse Decomposition Impulse decomposition is important because it allows signals to be examined one sample at a time. Similarly, systems are characterized by how they respond to impulses. By knowing how a system responds to an impulse, the system's output can be calculated for any given input. This approach is called convolution
  • 21. Fourier Decomposition Any N point signal can be decomposed into N/2 signals, half of them sine waves and half of them cosine waves. The lowest frequency cosine wave (called in this xC0 [n] illustration), makes zero complete cycles over the N samples, i.e., it is a DC signal.
  • 22. Fourier Decomposition The next cosine components: , , and , make 1, 2, xC1 [n] xC2 [n] xC3 [n] and 3 complete cycles over the N samples, respectively. Since the frequency of each component is fixed, the only thing that changes for different signals being decomposed is the amplitude of each of the sine and cosine waves.
  • 23. CONVOLUTION & FOURIER ANALYSISCONVOLUTION & FOURIER ANALYSIS The two main techniques of signal processing: Convolution and Fourier analysis. Strategy Decompose signals into simple additive components, Process the components in some useful manner, Synthesize the components into a final result. This is DSP.
  • 24. CONVOLUTIONCONVOLUTION Convolution is a mathematical way of combining two signals to form a third signal. Using the strategy of impulse decomposition, systems are described by a signal called the impulse response. Convolution relates the three signals of interest: the input signal, the output signal, and the impulse response. Convolution provides the mathematical framework for DSP
  • 25. IMPULSE RESPONSEIMPULSE RESPONSE The delta function is a normalized impulse, that is, sample number zero has a value of one, while all other samples have a value of zero. Delta function is frequently called the unit impulse.
  • 26. IMPULSE RESPONSEIMPULSE RESPONSE Impulse response is the signal that exits a system when a delta function (unit impulse) is the input. If two systems are different in any way, they will have different impulse responses. Just as the input and output signals are often called x[n] y[n] and , the impulse response is usually given the name is h[n]
  • 27. IMPULSE RESPONSEIMPULSE RESPONSE • Any impulse can be represented as a shifted and scaled delta function. • Consider a signal, , composed of all zeros except sample number 8, a[n] which has a value of -3. • This is the same as a delta function shifted to the right by 8 samples, and multiplied by -3. • In equation form: a[n] = -3δ[n-8]
  • 28. IMPULSE RESPONSEIMPULSE RESPONSE  If the input to a system is an impulse, such as , -3δ[n- 8] what is the system's output?  Scaling and shifting the input results in an identical scaling and shifting of the output.
  • 29. IMPULSE RESPONSEIMPULSE RESPONSE  If -3δ[n-8] results in h[n] , it follows that -3δ[n-8] results in -3h[n-8] h[n]  In words, the output is a version of the impulse response that has been shifted and scaled by the same amount as the delta function on the input.  If you know a system's impulse response, you immediately know how it will react to any impulse.
  • 30. How a system changes an input signal into an output signal  First, the input signal can be decomposed into a set of impulses, each of which can be viewed as a scaled and shifted delta function.  Second, the output resulting from each impulse is a scaled and shifted version of the impulse response.  Third, the overall output signal can be found by adding these scaled and shifted impulse responses.  In other words, if we know a system's impulse response, then we can calculate what the output will be for any possible input signal.
  • 31. • It is able to provide far better levels of signal processing than is possible with analogue hardware alone. • It is able to perform mathematical operations that enable many of the spurious effects of the analogue components to be overcome. • In addition to this, it is possible to easily update a digital signal processor by downloading new software. • Once a basic DSP card has been developed, it is possible to use this hardware design to operate in several different environments, performing different functions, purely by downloading different software. • It is also able to provide functions that would not be possible using analogue techniques. Advantages over analogue processing
  • 32. • It is not able to provide perfect filtering, demodulation and other functions because of mathematical limitations. • In addition to this the processing power of the DSP card may impose some processing limitations. • It is also more expensive than many analogue solutions, and thus it may not be cost effective in some applications. Limitations
  • 33. SPEECH ANALYSIS Extraction of properties or features from a speech signal Involves a transformation of s(n) into another signal, a set of signal or a set of parameters Objectives Simplification Data reduction
  • 34. Signal t • Continuous Signal (both parameters can assume a continuous range of values) Vertical Axis (y axis)– Amplitude Horizontal Axis (x axis) – Time The parameter on the y-axis (the dependent variable) is said to be a function of the parameter on the x-axis (the independent variable)
  • 35. Speech Wave form In this, the time axis is the horizontal axis from left to right and the curve shows how the pressure increases and decreases in the signal Time domain representation.
  • 37. Time domain vs Frequency domain (Temporal) vs (Spectral) Spectrum at 0.15 seconds into the utterance, in the beginning of the "o" vowel.
  • 38. SHORT TIME ANALYSIS  Short segments of speech signal are isolated and processed as if they were short segments from a sustained sound  This is repeated as often as desired  Each short segment is called an analysis frame  Result – a single number or set of numbers
  • 39. SHORT TIME ANALYSIS • ASSUMPTION  Properties of the speech signal change relatively slowly with time  This assumption leads to a variety of speech processing methods
  • 40. TYPES OF SHORT TIME ANALYSIS  Short Time Energy (Average Magnitude)  Short Time Average Zero crossing rate  Short Time Auto-correlation
  • 41. Short Time Energy (Average Magnitude) Amplitude of the speech signal varies appreciably with time Amplitude of unvoiced segments is much lower than the amplitude of voiced segments Short time energy provides a convenient representation that reflects these amplitude variations
  • 42. Short Time Energy (Average Magnitude) 50ms of a vowel Squared version of (a) Energy for a window length = 5 ms
  • 43. Short Time Average Zero crossing rate A zero crossing occurs when s(n) = 0, for a continuous signal A zero crossing occurs if successive samples have different algebraic signs, for a discrete signal
  • 44. Short Time Average Zero crossing rate For sinusoids F0 = ZCR/2 For speech signals calculation of F0 from ZCR is less precise High ZCR – Unvoiced speech Low ZCR – Voiced speech Draw back – Highly sensitive to noise. ZCR is a simple measure of frequency content of the signal t
  • 45. Short Time Autocorrelation Speech signal of s(n) Fourier transform of s(n) = S(e jw ) Energy spectrum = [S(e jw ) ]2 [S(e jw )]2 is called Autocorrelation of s(n) This preserves information about harmonic and formant amplitudes in s(n)
  • 46. Autocorrelation - Significance Autocorrelation function contains the energy Period can be estimated by finding the location of the first maximum in the auto correlation function. Auto correlation function contains much more information about the detailed structure of the signal.
  • 47. Autocorrelation - Application Applications 1. F0 estimation 2. Voiced /unvoiced determination 3. Linear prediction.
  • 48. Cepstrum DFTS(n) LOG MAGNITUDE IDFT S(ejω ) log|S(ejω )| Cepstrum was derived by reversing the first four letters of "spectrum” Cepstrum was introduced by Bogert, Healey and Tukey in 1963 for characterizing the seismic echoes resulting from earthquakes A cepstrum is the result of taking the Inverse Fourier transform (IFT) of the log spectrum as if it were a signal. Originally it was defined as ‘spectrum of spectrum’. Operations on cepstra are labelled as quefrency analysis, liftering, or cepstral analysis
  • 49. Why Cepstrum? • The cepstrum can be seen as information about rate of change in the different spectrum bands. • It has been used to determine the fundamental frequency of human speech. • Cepstrum pitch determination is particularly effective because the effects of the vocal excitation (pitch) and vocal tract (formants) are additive in the logarithm of the power spectrum and thus clearly separate. • The cepstrum is often used as a feature vector for representing the human voice and musical signals.
  • 50. Cepstral concepts - Quefrency The independent variable of a cepstral graph is called the quefrency. The quefrency is a measure of time, though not in the sense of a signal in the time domain. For example, if the sampling rate of an audio signal is 44100 Hz and there is a large peak in the cepstrum whose quefrency is 100 samples, the peak indicates the presence of a pitch that is 44100/100 = 441 Hz. This peak occurs in the cepstrum because the harmonics in the spectrum are periodic, and the period corresponds to the pitch.
  • 51. Cepstral concepts - Rahmonics • The x-axis of the cepstrum has units of quefrency, and peaks in the cepstrum (which relate to periodicities in the spectrum) are called rahmonics. • To obtain an estimate of the fundamental frequency from the cepstrum we look for a peak in the quefrency region
  • 52. Cepstral concepts - Liftering A filter that operates on a cepstrum might be called a lifter. A low pass lifter is similar to a low pass filter in the frequency domain. It can be implemented by multiplying by a window in the cepstral domain and when converted back to the time domain, resulting in a smoother signal.
  • 53. Cepstral Analysis • Low quefrency components or samples predominantly correspond to spectral envelope. (Up to about 3 to 4 msec). These are also called cepstral coefficients. • High quefrency components predominantly correspond to periodic excitation or source. (Beyond 4 msec) • If signal is periodic, a strong peak is seen over the high quefrency region at T0, the pitch period. • If signal is unvoiced, components are distributed over all quefrencies.
  • 54. The cepstral coefficients • Cepstral coefficients can be derived both from the filter- bank and linear predictive analyses. • By keeping only the first few cepstral coefficients and setting the remaining coefficients to zero, it is possible to smooth the harmonic structure of the spectrum. • Cepstral coefficients are therefore very convenient coefficients to represent the speech spectral envelope. • Cepstral coefficients have rather different dynamics, the higher coefficients showing the smallest variances.
  • 55. Cepstrum Formant can be estimated by locating the peaks in the log spectra For voiced speech there is a peak in the cepstrum For unvoiced speech there is no such peak in the cepstrum Position of the peak is a good estimate of the Pitch Period
  • 56. Linear Predictive Coding • Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques • It is one of the most useful methods for encoding good quality speech at a low bit rate. • It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
  • 57. Linear Predictive Coding Source-Excitation signal Transfer Function Speech We can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality-formants) and the excitation (which contains the pitch and the loudness)
  • 58. • LPC analyzes the speech signal by • estimating the formants, • removing their effects from the speech signal, • and estimating the intensity and frequency of the remaining buzz. • The process of removing the formants is called inverse filtering, and the remaining signal is called the residue.
  • 59. • The numbers which describe the formants and the residue can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. • Because speech signals vary slowly with time, this process is done on short chunks of the speech signal, which are called frames. Usually 30 to 50 frames per second give intelligible speech with good compression.
  • 60. Basic Principle A Speech sample can be approximated as a linear combination of past speech samples By minimizing the sum of the squared differences between the actual speech samples and the predicted ones, a unique set of predicted codes can be determined Linear Predictive Coding
  • 61. Applications 1. F0 estimation 2. Pitch 3. Vocal tract area functions 4. For representing speech for low bit transmission or storage Linear Predictive Coding
  • 62. Highlights 1. Extremely accurate estimation of Speech Parameters 2. High speed of Computation 3. Robust, reliable & accurate method Linear Predictive Coding
  • 63. Ways in which the basic models of analysis and the associated parameters from them are used in an integrated system  Diagnostic Applications (CSL & VAGMI)  Digital transmission of voice communication  Non – Machine communication by voice a. Voice Response systems b. Speaker recognition systems c. Speech recognition systems
  • 64. Pre-emphasis Before Pre- emphasis After Pre- emphasis Boost the amount of energy in the high frequencies. For voiced segments like vowels, there is more energy at the lower frequencies than at the higher frequencies - spectral tilt. Boosting the high frequency energy makes information from these higher formants more available to the acoustic model and improves phone detection accuracy. This pre-emphasis is done with a filter
  • 65. Windowing Goal of feature extraction is to provide spectral features. Speech is a non-stationary signal, spectrum changes very quickly if we extract spectral features from an entire utterance or conversation. Instead, we want to extract spectral features from a small window of speech that characterizes a particular subphone (its statistical properties are constant within this region). Windowing determines the portion of the speech signal that is to be analyzed by zeroing out the signal outside the region of interest. Pre Emphasis Window DFT Mel filter Bank log IDFT deltas
  • 66. Windowing techniques • Rectangular • Bartlett • Hamming • Hanning • Blackman • Kaiser The most commonly used are the Rectangular and the Hamming methods
  • 70.
  • 71. DFT Pre Emphasis Window DFT Mel filter Bank log IDFT deltas Spectrum at 0.15 seconds into the utterance, in the beginning of the "o" vowel.
  • 72. The Mel frequency Human hearing is not equally sensitive at all frequency bands. Modeling this property of human hearing during feature extraction improves speaker recognition performance. The form of the model used in MFCCs is to warp the frequencies output by the DFT onto the mel scale. A mel (Stevens et al, 1937; Stevens and Volkmann, 1940) is a unit of pitch. Pairs of sounds that are perceptually equidistant in pitch are separated by an equal number of mels. The mapping between frequency in hz and the mel scale is linear below 1000 Hz and logarithmic above 1000 Hz. The mel frequency can be computed from the raw acoustic frequency as follows: f Mel(f) = 1127ln (1+ ------) 700 Pre Emphasis Window DFT Mel filter Bank log IDFT deltas
  • 73. Mel filter Bank During MFCC computation, we implement this intuition by creating a bank of filters that collect energy from each frequency band, with 10 filters spaced linearly below 1000 Hz and the remaining filters spread logarithmically above 1000 Hz . Finally, we take the log of each of the mel spectrum values. In general, the human response to signal level is logarithmic - humans are less sensitive to slight differences in amplitude at high amplitudes than at low amplitudes. In addition, using a log makes the feature estimates less sensitive to variations in input such as power variations due to the speaker’s mouth moving closer or further from the microphone.
  • 74. Log magnitude spectrum Magnitude spectrum Log magnitude spectrum Pre Emphasis Window DFT Mel filter Bank log IDFT deltas Replace each amplitude value in the magnitude spectrum with its log Visualize the log spectrum as if itself were a waveform
  • 75. Cepstrum is the spectrum of the log of the spectrum. By taking the spectrum of the log spectrum, we have left the frequency domain of the spectrum and gone back to the time domain Pre Emphasis Window DFT Mel filter Bank log IDFT deltas IDFT
  • 76. There is a large peak around 120, corresponding to the Fo There are other various components at lower values on the x-axis. These represent the vocal tract filter (the position of the tongue and the other articulators). Thus, if we are interested in detecting phones, we can make use of just the lower cepstral values. If we are interested in detecting pitch, we can use the higher cepstral values Pre Emphasis Window DFT Mel filter Bank log IDFT deltas Cepstrum
  • 77. MFCC 12 co-efficients For MFCC extraction, we generally just take the first 12 cepstral values. These 12 coefficients will represent information solely about the vocal tract filter, cleanly separated from information about the glottal source. It turns out that cepstral coefficients have the extremely useful property that the variance of the different coefficients tends to be uncorrelated. This is not true for the spectrum, where spectral coefficients at different frequency bands are correlated. Pre Emphasis Window DFT Mel filter Bank log IDFT deltas MFCC
  • 78. The extraction of the cepstrum with the inverse DFT results in 12 cepstral coeffcients for each frame. We next add a 13th feature; the energy from the frame. Energy correlates with phone identity and so is a useful cue for phone detection (vowels and sibilants have more energy that stops, etc.). The energy in a frame is the sum over time of the power of the samples in the frame; thus, for a signal x in a window from time sample t1 to time sample t1, the energy is t2 Energy = ∑ x2 [t] t=t1 Pre Emphasis Window DFT Mel filter Bank log IDFT deltas Energy
  • 79. Deltas Speech signal is not constant from frame to frame. This change, such as the slope of a formant at its transitions, or the nature of the change from a stop closure to stop burst, can provide a useful cue for phone identity. For this reason, we also add features related to the change in cepstral features over time. We do this by adding for each of the 13 features (12 cepstral features plus energy) a delta or velocity feature and a double delta or acceleration feature. Each of the 13 delta features represents the change between frames in the corresponding cepstral energy feature, and each of the13 double delta features represents the change between frames in the corresponding delta features. Pre Emphasis Window DFT Mel filter Bank log IDFT deltas
  • 80.
  • 81. SPEECH SPECTROGRAPH • A speech spectrograph is a laboratory instrument that displays a graphical representation of the amplitudes of the various component frequencies of speech on a time based plot. • A tool for analyzing vocal output. • It is used for identifying the formants, and for real- time biofeedback in voice training and therapy
  • 83. Speech Spectrograph (Digital) Pre Emphasis Window DFT Plot Amplitude vs. Frequency Plot Spectro- gram Time
  • 84. Pre Emphasis Window DFT Plot Amplitude vs. Frequency Plot Spectro- gram Time
  • 85. SPEECH SPECTROGRAPH • There are two main kinds of analysis performed by the spectrograph, wideband (with a bandwidth of 300-500 Hz) and narrowband (with a bandwidth of 45-50 Hz).
  • 86. WIDEBAND SPECTROGRAPH • When used for normal speech with a fundamental frequency of around 100-200 Hz, will pick up energy from several harmonics at once and add them together. • The Fo (fundamental frequency) can be determined from the graphic • Also, the frequencies and relative strengths of the first two formants (F1 and F2) are visible as dark, rather blurry concentrations of energy.