SlideShare uma empresa Scribd logo
1 de 25
CHAPTER1

                                                                     INTRODUCTION

Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech
processing for representing the spectral envelope of a digital signal of speech in compressed
form, using the information of a linear predictive model. It is one of the most powerful
speech analysis techniques, and one of the most useful methods for encoding good quality
speech at a low bit rate and provides extremely accurate estimates of speech parameters.

                    ʊ
A vocoder (play /ˈvo koʊ dər/, short for voice encoder) is an analysis/synthesis system,
used to reproduce human speech. In the encoder, the input is passed through a multiband
filter, each band is passed through an envelope follower, and the control signals from the
envelope followers are communicated to the decoder. The decoder applies these (amplitude)
control signals to corresponding filters in the synthesizer. Since the control signals change
only slowly compared to the original speech waveform, the bandwidth required to transmit
speech can be reduced. This allows more speech channels to share a radio circuit or
submarine cable. By encoding the control signals, voice transmission can be secured against
interception.

The vocoder was originally developed as a speech coder for telecommunications applications
in the 1930s, the idea being to code speech for transmission. Transmitting the parameters of a
speech model instead of a digitized representation of the speech waveform saves bandwidth
in the communication channel; the parameters of the model change relatively slowly,
compared to the changes in the speech waveform that they describe. Its primary use in this
fashion is for secure radio communication, where voice has to be encrypted and then
transmitted. The advantage of this method of "encryption" is that no 'signal' is sent, but rather
envelopes of the bandpass filters. The receiving unit needs to be set up in the same channel
configuration to resynthesize a version of the original signal spectrum. The vocoder as both
hardware and software has also been used extensively as an electronic musical instrument.

       Whereas the vocoder analyzes speech, transforms it into electronically transmitted
information, and recreates it, The Voder (from Voice Operating Demonstrator) generates
synthesized speech by means of a console with fifteen touch-sensitive keys and a pedal,



                                               1
basically consisting of the "second half" of the vocoder, but with manual filter controls,
needing a highly trained operator.

Since the late 1970s, most non-musical vocoders have been implemented using linear
prediction, whereby the target signal's spectral envelope (formant) is estimated by an all-pole
IIRfilter. In linear prediction coding, the all-pole filter replaces the bandpass filter bank of its
predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and
again at the decoder to re-apply the spectral shape of the target speech signal.


1.1 Organization of the project:
    Chapter 1: Introduction
    Chapter 2: General theory
    Chapter 3:Block diagram Description
    Chapter 4:Software Description
    Chapter 5:Results and Conclusion




                                                 2
CHAPTER2

                                                                    GENERAL THEORY

2.1 Overview

       LPC starts with the assumption that a speech signal is produced by a buzzer at the end
of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and
plosive sounds). Although apparently crude, this model is actually a close approximation of
the reality of speech production. The glottis (the space between the vocal folds) produces the
buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract
(the throat and mouth) forms the tube, which is characterized by its resonances, which give
rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are
generated by the action of the tongue, lips and throat during sibilants and plosives.

           LPC analyzes the speech signal by estimating the formants, removing their effects
from the speech signal, and estimating the intensity and frequency of the remaining buzz. The
process of removing the formants is called inverse filtering, and the remaining signal after the
subtraction of the filtered modeled signal is called the residue.

              The numbers which describe the intensity and frequency of the buzz, the
formants, and the residue signal, can be stored or transmitted somewhere else. LPC
synthesizes the speech signal by reversing the process: use the buzz parameters and the
residue to create a source signal, use the formants to create a filter (which represents the
tube), and run the source through the filter, resulting in speech.

 Because speech signals vary with time, this process is done on short chunks of the speech
signal, which are called frames; generally 30 to 50 frames per second give intelligible speech
with good compression.


2.2 LPC coefficient representations

             LPC is frequently used for transmitting spectral envelope information, and as
such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly
(see linear prediction for definition of coefficients) is undesirable, since they are very


                                                3
sensitive to errors. In other words, a very small error can distort the whole spectrum, or
worse, a small error might make the prediction filter unstable.

              There are more advanced representations such as Log Area Ratios (LAR), line
spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP
decomposition has gained popularity, since it ensures stability of the predictor, and spectral
errors are local for small coefficient deviations


Log area ratios (LAR)

                     LAR can be used to represent reflection coefficients (another form for
linear prediction coefficients) for transmission over a channel. While not as efficient as line
spectral pairs (LSPs), log area ratios are much simpler to compute. Let              be the kth
reflection coefficient of a filter, the kth LAR is:




Use of Log Area Ratios have now been mostly replaced by Line Spectral Pairs, but older
codecs, such as GSM-FR use LARs.


Line spectral pairs

            Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent
linear prediction coefficients (LPC) for transmission over a channel. LSPs have several
properties (e.g. smaller sensitivity to quantization noise) that make them superior to direct
quantization of LPCs. For this reason, LSPs are very useful in speech coding.


Mathematical foundation



The LP polynomial                                     can be decomposed into:




                                                 4
whereP(z) corresponds to the vocal tract with the glottis closed and Q(z) with the glottis
open.While A(z) has complex roots anywhere within the unit circle (z-transform), P(z) and
Q(z) have the very useful property of only having roots on the unit circle, hence P is a
palindromic polynomial and Q an antipalindromic polynomial. So to find them we take a

test point                   and evaluate                    and                     using a grid of
points between 0 and pi. The zeros (roots) of P(z) and Q(z) also happen to be interspersed
which is why we swap coefficients as we find roots. So the process of finding the LSP
frequencies is basically finding the roots of two polynomials of order p + 1. The roots of P(z)
and Q(z) occur in symmetrical pairs at ±w, hence the name Line Spectrum Pairs (LSPs).
Because all the roots are complex and two roots are found at 0 and            , only p/2 roots need to
be found for each polynomial. The output of the LSP search thus has p roots, hence the same
number of coefficients as the input LPC filter (not counting             ).


To convert back to LPCs, we need to evaluate                                           by "clocking"
an impulse through it N times (order of the filter), yielding the original filter, A(z).


Properties

Line spectral pairs have several interesting and useful properties. When the roots of P(z) and
Q(z) are interleaved, stability of the filter is ensured if and only if the roots are monotonically
increasing. Moreover, the closer two roots are, the more resonant the filter is at the
corresponding frequency. Because LSPs are not overly sensitive to quantization noise and
stability is easily ensured, LSP are widely used for quantizing LPC filters. Line spectral
frequencies can be interpolated.


Reflection coefficient
        The reflection coefficient is used in physics and electrical engineering when wave
propagation in a medium containing discontinuities is considered. A reflection coefficient
describes either the amplitude or the intensity of a reflected wave relative to an incident
wave. The reflection coefficient is closely related to the transmission coefficient.
2.3 Pitch Period Estimation
    Determining if a segment is a voiced or unvoiced sound is not all of the information that
is needed by the LPC decoder to accurately reproduce a speech signal. In order to produce an
input signal for the LPC filter the decoder also needs another attribute of the current speech

                                                 5
segment known as the pitch period. The period for any wave, including speech signals, can be
defined as the time required for one wave cycle to completely pass a fixed position. For
speech signals, the pitch period can be thought of as the period of the vocal cord vibration
that occurs during the production of voiced speech. Therefore, the pitch period is only needed
for the decoding of voiced segments and is not required for unvoiced segments since they are
produced by turbulent air flow not vocal cord vibrations.
       It is very computationally intensive to determine the pitch period for a given segment
of speech. There are several different types of algorithms that could be used. One type of
algorithm takes advantage of the fact that the autocorrelation of a period function, Rxx(k), will
have a maximum when k is equivalent to the pitch period. These algorithms usually detect a
maximum value by checking the autocorrelation value against a threshold value. One
problem with algorithms that use autocorrelation is that the validity of their results is
susceptible to interference as a result of other resonances in the vocal tract. When
interference occurs the algorithm cannot guarantee accurateresults. Another problem with
autocorrelation algorithms occurs because voiced speech is not entirely periodic. This means
that the maximum will be lower than it should be for a true periodic signal.


2.4 Applications

       LPC is generally used for speech analysis and resynthesis. It is used as a form of
       voice compression by phone companies, for example in the GSM standard. It is also
       used for secure wireless, where voice must be digitized, encrypted and sent over a
       narrow voice channel; an early example of this is the US government's Navajo I.
       LPC synthesis can be used to construct vocoders where musical instruments are used
       as excitation signal to the time-varying filter estimated from a singer's speech. This is
       somewhat popular in electronic music. Paul Lansky made the well-known computer
       music piece notjustmoreidlechatter using linear predictive coding.A 10th-order LPC
       was used in the popular 1980's Speak & Spell educational toy.
       Waveform ROM in some digital sample-based music synthesizers made by Yamaha
       Corporation may be compressed using the LPC algorithm.
       LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, and other lossless audio
       codec‘s.




                                               6
2.4.1 Voice effects in music

                   For musical applications, a source of musical sounds is used as the carrier,
instead of extracting the fundamental frequency. For instance, one could use the sound of a
synthesizer as the input to the filter bank, a technique that became popular in the 1970s.

One of the earliest person who recognized the possibility of Vocoder/Voder on the electronic
music     may       be    Werner     Meyer-Eppler,        a   German     physicist/experimental
acoustician/phoneticist. In 1949, he published a thesis on the electronic music and speech
synthesis from the viewpoint of sound synthesis, and in 1951, he joined to the successful
proposal of establishment of WDR Cologne Studio for Electronic Music.

        Siemens Synthesizer (c.1959) at Siemens Studio for Electronic Music was one of the
        first attempt to divert vocoder to create music
        One of the first attempt to divert vocoder to create music may be a ―Siemens
        Synthesizer‖ at Siemens Studio for Electronic Music, developed between 1956-1959.
        In 1968, Robert Moog developed one of the first solid-state musical vocoder for
        electronic music studio of University at Buffalo.In 1969, Bruce Haack built a
        prototype vocoder, named "Farad" after Michael Faraday, and it was featured on his
        rock album The Electric Lucifer released in the same year.
        In 1970 Wendy Carlos and Robert Moog built another musical vocoder, a 10-band
        device inspired by the vocoder designs of Homer Dudley. It was originally called a
        spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier
        signal came from a Moog modular synthesizer, and the modulator from a microphone
        input. The output of the 10-band vocoder was fairly intelligible, but relied on
        specially articulated speech. Later improved vocoders use a high-pass filter to let
        some sibilance through from the microphone; this ruins the device for its original
        speech-coding application, but it makes the "talking synthesizer" effect much more
        intelligible.
        Carlos and Moog's vocoder was featured in several recordings, including the
        soundtrack to Stanley Kubrick's A Clockwork Orange in which the vocoder sang the
        vocal part of Beethoven's "Ninth Symphony". Also featured in the soundtrack was a
        piece called "Timesteps," which featured the vocoder in two sections. "Timesteps"
        was originally intended as merely an introduction to vocoders for the "timid listener",

                                                7
but Kubrick chose to include the piece on the soundtrack, much to the surprise of
       Wendy Carlos.[citation needed]
       Kraftwerk's Autobahn (1974) was one of the first successful pop/rock albums to
       feature vocoder vocals. Another of the early songs to feature a vocoder was "The
       Raven" on the 1976 album Tales of Mystery and Imagination by progressive rock
       band The Alan Parsons Project; the vocoder also was used on later albums such as I
       Robot. Following Alan Parsons' example, vocoders began to appear in pop music in
       the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light
       Orchestra used the vocoder in several albums such as Time (featuring the Roland VP-
       330 PlusMkI). ELO songs such as "Mr. Blue Sky" and "Sweet Talkin' Woman" both
       from Out of the Blue (1977) use the vocoder extensively. Featured on the album are
       the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI
       or II).


2.4.2 speaker-dependent word recognition device

The speaker-dependent word recognition device is implemented using the Motorola
DSP56303. First the speaker will train the device by storing 10 different vowel sounds into
memory. Then the same speaker can repeat one of the ten words associated with the vowel
sound and the device can detect which word was repeated and flag an appropriate output.


                 Vowel              Microphone
                                                                A/D Converter
                 Sound                  Input

                                     Store coefficients        Calculate LPC
                                     in memory                 coefficients

                               Fig 2.1: Training the Device




         Vowel Sound                 Microphone Input            A/D Converter

                                                8
Compare coefficients             Calculate LPC
          output                       with the one in                 coefficients
                                          memory



                                  Fig 2.2:Word Recognition


2.5 Modern vocoder implementations

       Even with the need to record several frequencies, and the additional unvoiced sounds,
the compression of the vocoder system is impressive. Standard speech-recording systems
capture frequencies from about 500 Hz to 3400 Hz, where most of the frequencies used in
speech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate).
The sampling resolution is typically at least 12 or more bits per sample resolution (16 is
standard), for a final data rate in the range of 96-128 kbit/s. However, a good vocoder can
provide a reasonable good simulation of voice with as little as 2.4 kbit/s of data.

       'Toll Quality' voice coders, such as ITU G.729, are used in many telephone networks.
G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves
slightly worse quality at data rates of 5.3 kbit/s and 6.4 kbit/s. Many voice systems use even
lower data rates, but below 5 kbit/s voice quality begins to drop rapidly.

Several vocoder systems are used in NSA encryption systems:

       LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding
       Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016,
       used in STU-III
       Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band
       encryptors such as the KY-57.
       Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the
       Future Narrowband Digital Terminal FNBDT, NSA's 21st century secure telephone.
       Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32
       kbit/s used in STE secure telephone(ADPCM is not a proper vocoder but rather a
       waveform codec. ITU has gathered G.721 along with some other ADPCM codec‘s
       into G.726.)


                                                9
Vocoders are also currently used in developing psychophysics, linguistics,
    computational neuroscience and cochlear implant research.
    Modern vocoders that are used in communication equipment and in voice storage
    devices today are based on the following algorithms:

   Algebraic code-excited linear prediction (ACELP 4.7 kbit/s – 24 kbit/s)
   Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s)
   Multi-band excitation (AMBE 2000 bit/s – 9600 bit/s)
   Sinusoidal-Pulsed Representation (SPR 300 bit/s – 4800 bit/s)
   Tri-Wave Excited Linear Prediction (TWELP 600 bit/s – 9600 bit/s)




                                                                       CHAPTER 3
                                                            BLOCK DIAGRAM

                                         10
3.1 Block diagram description:

General block diagram (as shown in Fig 3.2) of LPC consists of the blocks

    A/D Converter
    End Point Detection
    Pre-emphasis filter
    Frame blocking
    Hamming window
    Auto-Correlation
    Levinson-Durbin algorithm




                      Fig 3.1: LPC analysis and synthesis of speech




   A/D Converter            End Point Detection        Pre-emphasis filter

                                            11
Auto-Correlation                   Frame Blocking                   Hamming

                                                                        Window
   Levinson-Durbin Algorithm                           SSD Comparison
                                                                                  Output

                                      Fig 3.2: General Block Diagram

3.2.1 A/D Converter
        For Motorola DSP56303, the device converts the analog signals to digital samples by
an ASM file called ‗core302.asm‘. The samples are input from CODEC A/D input port as
shown in Fig. 5. The assembly file initializes the necessary peripheral settings for general I/O
purposes. Moreover, the file also contains a macro called waitdata. The macro waits for a
sample and takes the sample in. The sampling rate is set to 8000 samples/second.
3.2.2End Point Detection
        In the end point detection, each sample taken from A/D converter is compared to a
volume threshold. If the sample is lower than the threshold, it is considered as background
noise and therefore disregarded. Otherwise, the DSP board, will output 4 bits high to Port B
to indicate readiness to process speech samples, and the next 2000 samples will be stored into
a buffer before processing.
3.2.3Pre-emphasis filter
        The pre-emphasis is a low-order digital filter. The filter has transfer function show in
Equation (3.1).
                               H(z) = 1 – 0.9375 z-1                     (3.1)
        The digitized speech signal goes through the filter to average transmission conditions,
noise backgrounds, and signal spectrums. The filter boosts up the high frequency components
of human voice and attenuates the low frequency component of human voice. Because human
voice typically has higher power at low frequencies, the filter renders the speech sample easy
for LPC calculation.
3.2.4 Frame blocking
The pre-emphasized speech samples are divided into 30-ms window frames. Each 30 ms
window frame consists of 240 samples as illustrated in Equation (3.2) and Equation (3.3).
(Sampling Rate)(Frame Length) = Number of Samples in a Frame                        (3.2)
(8000 samples/second)(0.030 second) = 240 samples                         (3.3)



                                                       12
In addition, adjacent window frames are separated by 80 samples (240 x 1/3), with
160 overlapping samples. The amount of separation and overlapping depends on frame
length. The frame length is chosen according to the sampling rate. The higher the sampling
rate, the larger the frame length to be accurate.


3.2.5 Hamming window

       The windowing method involves multiplying the ideal impulse response with a
window function to generate a corresponding filter, which tapers the ideal impulse response.
Like the frequency sampling method, the windowing method produces a filter whose
frequency response approximates a desired frequency response. The windowing method,
however, tends to produce better results than the frequency sampling method.

       The toolbox provides two functions for window-based filter design, fwind1 and
fwind2. fwind1 designs a two-dimensional filter by using a two-dimensional window that it
creates from one or two one-dimensional windows that you specify. fwind2 designs a two-
dimensional filter by using a specified two-dimensional window directly.

   fwind1 supports two different methods for making the two-dimensional windows it uses:

       Transforming a single one-dimensional window to create a two-dimensional window
       that is nearly circularly symmetric, by using a process similar to rotation
       Creating a rectangular, separable window from two one-dimensional windows, by
       computing their outer product

   The example below uses fwind1 to create an 11-by-11 filter from the desired frequency
response Hd. The example uses the Signal Processing Toolbox hamming function to create a
one-dimensional window, which fwind1 then extends to a two-dimensional window.

Hd = zeros(11,11); Hd(4:8,4:8) = 1;
[f1,f2] = freqspace(11,'meshgrid');
mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))
h = fwind1(Hd,hamming(11));
figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2])




                                                13
Below images shows Desired Two-Dimensional Frequency Response (left) and Actual Two-
Dimensional Frequency Response (right)




Fig3.4:Two-Dimensional Frequency Response


Creating the Desired Frequency Response Matrix

       The filter design functions fsamp2, fwind1, and fwind2 all create filters based on a
desired frequency response magnitude matrix. Frequency response is a mathematical function
describing the gain of a filter in response to different input frequencies.


3.2.6 Auto-Correlation

       The Autocorrelation LPC block determines the coefficients of an N-step forward
linear predictor for the time-series in each length-M input channel, u, by minimizing the
prediction error in the least squares sense. A linear predictor is an FIR filter that predicts the
next value in a sequence from the present and past inputs. This technique has applications in
filter design, speech coding, spectral analysis, and system identification.

       The Autocorrelation LPC block can output the prediction error for each channel as
polynomial coefficients, reflection coefficients, or both. It can also output the prediction error
power for each channel. The input u can be a scalar, unoriented vector, column vector,
sample-based row vector, or a matrix. Frame-based row vectors are not valid inputs. The
block treats all M-by-N matrix inputs as N channels of length M.

       When you select Inherit prediction order from input dimensions, the prediction order,
N, is inherited from the input dimensions. Otherwise, you can use the Prediction



                                                14
orderparameter to specify the value of N. Note that N must be a scalar with a value less than
the length of the input channels or the block produces an error.

          When Output(s) is set to A, port A is enabled. For each channel, port A outputs an
(N+1)-by-1 column vector, a = [1 a2a3 ... aN+1]T, containing the coefficients of an Nth-order
moving average (MA) linear process that predicts the next value, ûM+1, in the input time-
series.




          When Output(s) is set to K, port K is enabled. For each channel, port K outputs a
length-N column vector whose elements are the prediction error reflection coefficients. When
Output(s)is set to A and K, both port A and K are enabled, and each port outputs its respective
set of prediction coefficients for each channel.

          When you select Output prediction error power (P), port P is enabled. The prediction
error power is output at port P as a vector whose length is the number of input channels.


3.2.7Levinson-Durbin algorithm

          Ra = bin the cases where:

          R is a Hermitian, positive-definite, Toeplitz matrix.
          b is identical to the first column of R shifted by one element and with the opposite
          sign.




The input to the block, r = [r(1) r(2) ... r(n+1)], can be a vector or a matrix. If the input is a
matrix, the block treats each column as an independent channel and solves it separately. Each
channel of the input contains lags 0 through n of an autocorrelation sequence, which appear
in the matrix R.



                                                 15
The block can output the polynomial coefficients, A, the reflection coefficients, K,
and the prediction error power, P, in various combinations. The Output(s) parameter allows
you to enable the A and K outputs by selecting one of the following settings:

        A — For each channel, port A outputs A=[1 a(2) a(3) ... a(n+1)], the solution to the
        Levinson-Durbin equation. A has the same dimension as the input. You can also view
        the elements of each output channel as the coefficients of an nth-order autoregressive
        (AR) process.
        K — For each channel, port K outputs K=[k(1) k(2) ... k(n)], which contains n
        reflection coefficients and has the same dimension as the input, less one element. A
        scalar input channel causes an error when you select K. You can use reflection
        coefficients to realize a lattice representation of the AR process described later in this
        page.
        A and K — The block outputs both representations at their respective ports. A scalar
        input channel causes an error when you select A and K.

        Select the Output prediction error power (P) check box to output the prediction error
power for each channel, P. For each channel, P represents the power of the output of an FIR
filter with taps A and input autocorrelation described by r, where A represents a prediction
error filter and r is the input to the block. In this case, A is a whitening filter. P has one
element per input channel.

        When you select the If the value of lag 0 is zero, A=[1 zeros], K=[zeros], P=0 check
box (default), an input channel whose r(1) element is zero generates a zero-valued output.
When you clear this check box, an input with r(1) = 0 generates NaNs in the output. In
general, an input with r(1) = 0 is invalid because it does not construct a positive-definite
matrix R. Often, however, blocks receive zero-valued inputs at the start of a simulation. The
check box allows you to avoid propagating NaNs during this period.


Applications

        One application of the Levinson-Durbin formulation implemented by this block is in
the Yule-Walker AR problem, which concerns modeling an unknown system as an
autoregressive process. You would model such a process as the output of an all-pole IIR filter
with white Gaussian noise input. In the Yule-Walker problem, the use of the signal's

                                                 16
autocorrelation sequence to obtain an optimal estimate leads to anRa = b equation of the type
shown above, which is most efficiently solved by Levinson-Durbin recursion. In this case, the
input to the block represents the autocorrelation sequence, with r(1) being the zero-lag value.
The output at the block's A port then contains the coefficients of the autoregressive process
that optimally models the system. The coefficients are ordered in descending powers of z, and
the AR process is minimum phase. The prediction error, G, defines the gain for the unknown

system, where              :




        The output at the block's K port contains the corresponding reflection coefficients,
[k(1) k(2) ... k(n)], for the lattice realization of this IIR filter. The Yule-Walker AR Estimator
block implements this autocorrelation-based method for AR model estimation, while the
Yule-Walker Method block extends the method to spectral estimation.

        Another common application of the Levinson-Durbin algorithm is in linear predictive
coding, which is concerned with finding the coefficients of a moving average (MA) process
(or FIR filter) that predicts the next value of a signal from the current signal sample and a
finite number of past samples. In this case, the input to the block represents the signal's
autocorrelation sequence, with r(1) being the zero-lag value, and the output at the block's A
port contains the coefficients of the predictive MA process (in descending powers of z).




These coefficients solve the following optimization problem:




        Again, the output at the block's K port contains the corresponding reflection
coefficients, [k(1) k(2) ... k(n)], for the lattice realization of this FIR filter. The

                                                  17
Autocorrelation LPC block in the Linear Prediction library implements this autocorrelation-
based prediction method.


3.2.8 Sum of Square of Difference comparison
       The Sum of Square of Difference comparison is quantitative method to compare two
sets of LPC coefficients. Suppose one set of LPC coefficients in the template are
A‘1, A‘2, A‘3, …., A‘10, and another set of LPC coefficients obtained from a window frame
are A1, A2, A3, …., A‘10.
         SSD = (A‘1 – A1)2 + (A‘2 – A2)2 + (A‘3 – A3)2 + …. + (A‘10 – A10)2
       Each time the window frame is shifted, SSD is calculated between LPC coefficients
from the window frame and every set of LPC coefficients in template. A minimum SSD
exists between LPC coefficients from a window frame and one set of LPC coefficients in
template. The one with the minimum SSD value is the closest match to the input vowel.




                                             18
CHAPTER 4

                                                   SOFTWARE DESCRIPTION

4.1 MATLAB INTRODUCTION:

The name MATLAB stands for MATrix LABoratory. MATLAB was written originallyto
provide easy access to matrix software developed by the LINPACK (linear system package)
and EISPACK (Eigen system package) projects. MATLAB is a high-performance language
for technical computing. It integrates computation, visualization, and programming
environment. Furthermore, MATLAB is a modern programming language environment: it
has sophisticated data structures, contains built-in editing and debugging tools, and supports
object-oriented programming. These factors make MATLAB an excellent tool for teaching
and research. MATLAB has many advantages compared to conventional computer languages
(e.g.,C, FORTRAN) for solving technical problems. MATLAB is an interactive system
whose basic data element is an arraythat does not require dimensioning. The software
package has been commercially available since 1984 and is now considered as a standard tool
at most universities and industries worldwide. It has powerful built-inroutines that enable a
very wide variety of computations. It also has easy to use graphics commands that make the
visualization of results immediately available. Special c applications are collected in
packages referred to astoolbox. There are toolboxes for signal processing, symbolic
computation, control theory, simulation, optimization, and several other fields of applied
science and engineering.
4.2Mathematical functions:
MATLAB offers many predefined mathematical functions for technical computing
whichcontains a large set of mathematical functions.Typing help elfun and help specfun calls
up full lists of elementaryand special functions respectively.There is a long list of
mathematical functions that are builtinto MATLAB. Thesefunctions are called built-ins.
Many standard mathematical functions, such as sin(x), cos(x),tan(x), ex, ln(x), are evaluated
by the functions sin, cos, tan, exp, and log respectively in MATLAB.
4.3 Basic plotting:
MATLAB has an excellent set of graphic tools. Plotting a given data set or the results
of computation is possible with very few commands. We are highly encouraged to plot


                                              19
mathematical functions and results of analysis as often as possible. Trying to understand
mathematical equations with graphics is an enjoyable and very efficient way of learning
mathematics.
4.4 Matrix generation:
Matrices are the basic elements of the MATLAB environment. A matrix is a two-
dimensionalarray consisting of mrows and ncolumns. Special cases are column vectors (n= 1)
and rowvectors(m= 1). MATLAB supports two types of operations, known as matrix
operationsand array operations.
MATLAB provides functions that generate elementary matrices. The matrix of zeros,
thematrix of ones, and the identity matrix are returned by the functions zeros, ones, and
eye,respectively.
table 3.1:Elementary matrices




4.5 Programming in Matlab:
4.5.1 M-File scripts:
A script fileis an external file that contains a sequence of MATLAB statements. Script files
have a filename extension .m and are often called M-files. M-files can be scripts that simply
execute a series of MATLAB statements, or they can be functionsthat can accept arguments
and can produce one or more outputs.
4.5.2 Script side-effects:
All variables created in a script file are added to the workspace. This may have undesirable
effects, because:
      Variables already existing in the workspace may be overwritten.
      The execution of the script can be affected by the state variables in the workspace.
As a result, because scripts have some undesirable side-effects, it is better to code any
complicated applications using rather function M-file.
4.5.3 Input to script Files:

                                               20
When a script file is executed, the variables that are used in the calculations within the file
must have assigned values. The assignment of a value to a variable can be done in three
ways.
1. The variable is defined in the script file.
2. The variable is defined in the command prompt.
3. The variable is entered when the script is executed.
4.5.4 Output Commands:
MATLAB automatically generates a displaywhen commands are executed. In addition to this
automatic display, MATLAB has several commands that can be used to generate displays or
outputs. Two commands that are frequently used to generate output are: disp and fprintf.
                               Table for disp and fprint commands




4.5.5 Saving output to a File:
In addition to displaying output on the screen, the command fprintf can be used for writing
output to a file. The saved data can subsequently be used by MATLAB or other
software‘s.To save the results of some computation to a file in a text format requires the
followingsteps:
1. Open a file using fopen
2. Write the output using fprintf
3. Close the file using fclose
4.6 Debugging M-Files:
4.6.1 Introduction:
This section introduces general techniques for finding errors in M-files. Debugging is
theprocess by which you isolate and fix errors in your program or code.
Debugging helps to correct two kind of errors:
        Syntax errors - For example omitting a parenthesis or misspelling a function name.
        Run-time errors - Run-time errors are usually apparent and difficult to track
         down.They produce unexpected results.
4.6.2 Debugging process:

                                                 21
We can debug the M-files using the Editor/Debugger as well as using debugging functions
from the Command Window. The debugging process consists of Preparing for debugging:
      MATLAB is relatively easy to learn
      Setting breakpoints
      Running an M-file with breakpoints
      Stepping through an M-file
      Examining values
      Correcting problems
      Ending debugging
4.7 Strengths:
      MATLAB may behave as a calculatoror as a programming language
      MATLAB combine nicely calculation and graphic plotting
      MATLAB is interpreted (not compiled ),errors are easy to fix
      MATLAB is optimized to be relatively fast when performing matrix operations
4.8 Weaknesses:
      MATLAB is not a general purpose programming language such as C, C++, or
      FORTRAN.
      MATLAB is designed for scientific computing, and is not well suitable for other
      applications.
      MATLAB is an interpreted language, slower than a compiled language such as C++
      MATLAB commands are specific for MATLAB usage. Most of them do not have a
      direct equivalent with other programming language commands.




                                                                      CHAPTER 5

                                           RESULTS AND CONCLUSIONS


                                           22
5.1 Result

      The implementation of a LPC vocoder is really an exciting and challenging matter. A
      lot of techniques were learned from the literature and practice during this work.
      Looking to the complexity of the voiced / unvoiced decision in the LPC-10e DoD
      vocoder, it is clear that a good algorithm must have a lot of intelligence and
      adaptability in order to get good results. The main problem is the estimation of the
      pitch. Secondly, a robust voiced / unvoiced decision is very important.




                                  Fig 5.1:Screen shot of input




                                            23
Fig 5.2: Screen shot of output

       It was found that considering the memory of the LPC filter leads to better
results. The median filter was not able to give a smooth pitch contour. Some
techniques like avoiding abrupt changes in the pitch value and avoiding double and
half pitches should be incorporated in order to get better results.




                                        24
REFERENCES

Partha S Malik, MATLAB and SIMULINK 3rd edition
Stephen J Chapman, MATLAB programming for engineers 2nd edition
www.wikipedia.org
www.mathworks.com




                                  25

Mais conteúdo relacionado

Mais procurados

4.5 equalizers and its types
4.5   equalizers and its types4.5   equalizers and its types
4.5 equalizers and its typesJAIGANESH SEKAR
 
Vestigial side band (vsb)
Vestigial side band (vsb)Vestigial side band (vsb)
Vestigial side band (vsb)ggpriya me
 
Equalization
EqualizationEqualization
Equalizationbhabendu
 
Fir filter design using windows
Fir filter design using windowsFir filter design using windows
Fir filter design using windowsSarang Joshi
 
3.2 modulation formats bpsk, qpsk, oqpsk,
3.2 modulation formats   bpsk, qpsk, oqpsk,3.2 modulation formats   bpsk, qpsk, oqpsk,
3.2 modulation formats bpsk, qpsk, oqpsk,JAIGANESH SEKAR
 
Unit 6: DSP applications
Unit 6: DSP applicationsUnit 6: DSP applications
Unit 6: DSP applicationsMinakshi Atre
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Phase modulation
Phase modulationPhase modulation
Phase modulationavocado1111
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IAmr E. Mohamed
 
NYQUIST CRITERION FOR ZERO ISI
NYQUIST CRITERION FOR ZERO ISINYQUIST CRITERION FOR ZERO ISI
NYQUIST CRITERION FOR ZERO ISIFAIZAN SHAFI
 
Frequency synthesizer
Frequency synthesizerFrequency synthesizer
Frequency synthesizerSagarKumar153
 
Radar signal processing
Radar signal processingRadar signal processing
Radar signal processingMustahid Ali
 
Digital modulation techniques...
Digital modulation techniques...Digital modulation techniques...
Digital modulation techniques...Nidhi Baranwal
 
Digital to Analog Conversion
Digital to Analog ConversionDigital to Analog Conversion
Digital to Analog ConversionSoumen Santra
 
Analog to Digital Converters
Analog to Digital ConvertersAnalog to Digital Converters
Analog to Digital ConvertersAnas Smarty
 

Mais procurados (20)

Analog to digital converters, adc
Analog to digital converters, adcAnalog to digital converters, adc
Analog to digital converters, adc
 
4.5 equalizers and its types
4.5   equalizers and its types4.5   equalizers and its types
4.5 equalizers and its types
 
Vestigial side band (vsb)
Vestigial side band (vsb)Vestigial side band (vsb)
Vestigial side band (vsb)
 
Equalization
EqualizationEqualization
Equalization
 
Fir filter design using windows
Fir filter design using windowsFir filter design using windows
Fir filter design using windows
 
3.2 modulation formats bpsk, qpsk, oqpsk,
3.2 modulation formats   bpsk, qpsk, oqpsk,3.2 modulation formats   bpsk, qpsk, oqpsk,
3.2 modulation formats bpsk, qpsk, oqpsk,
 
Unit 6: DSP applications
Unit 6: DSP applicationsUnit 6: DSP applications
Unit 6: DSP applications
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Equalization
EqualizationEqualization
Equalization
 
Phase modulation
Phase modulationPhase modulation
Phase modulation
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
NYQUIST CRITERION FOR ZERO ISI
NYQUIST CRITERION FOR ZERO ISINYQUIST CRITERION FOR ZERO ISI
NYQUIST CRITERION FOR ZERO ISI
 
Frequency synthesizer
Frequency synthesizerFrequency synthesizer
Frequency synthesizer
 
Multiplexing : FDM
Multiplexing : FDMMultiplexing : FDM
Multiplexing : FDM
 
ASk,FSK,PSK
ASk,FSK,PSKASk,FSK,PSK
ASk,FSK,PSK
 
Radar signal processing
Radar signal processingRadar signal processing
Radar signal processing
 
Digital modulation techniques...
Digital modulation techniques...Digital modulation techniques...
Digital modulation techniques...
 
Digital to Analog Conversion
Digital to Analog ConversionDigital to Analog Conversion
Digital to Analog Conversion
 
Analog to Digital Converters
Analog to Digital ConvertersAnalog to Digital Converters
Analog to Digital Converters
 
Digital Filters Part 1
Digital Filters Part 1Digital Filters Part 1
Digital Filters Part 1
 

Destaque

LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech RecognitionDr. Uday Saikia
 
Linear prediction
Linear predictionLinear prediction
Linear predictionUma Rajaram
 
25 quantization and_compression
25 quantization and_compression25 quantization and_compression
25 quantization and_compressionAmina Byalal
 
A study of EMG based Speech Recognition
A study of EMG  based Speech Recognition A study of EMG  based Speech Recognition
A study of EMG based Speech Recognition vetrivel D
 
Adaptive quantization methods
Adaptive quantization methodsAdaptive quantization methods
Adaptive quantization methodsMahesh pawar
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project reportSarang Afle
 
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless TechnologiesPresentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless TechnologiesKaushal Kaith
 

Destaque (16)

LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 
Linear prediction
Linear predictionLinear prediction
Linear prediction
 
25 quantization and_compression
25 quantization and_compression25 quantization and_compression
25 quantization and_compression
 
A study of EMG based Speech Recognition
A study of EMG  based Speech Recognition A study of EMG  based Speech Recognition
A study of EMG based Speech Recognition
 
Adaptive quantization methods
Adaptive quantization methodsAdaptive quantization methods
Adaptive quantization methods
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
EQUALIZATION
EQUALIZATIONEQUALIZATION
EQUALIZATION
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
GSM channels
GSM channelsGSM channels
GSM channels
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless TechnologiesPresentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
 

Semelhante a Linear predictive coding documentation

Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsAmr E. Mohamed
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing documenthimadrigupta
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniquesidescitation
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
Lpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlabLpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlabchakravarthy Gopi
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 
Speech Enhancement for Nonstationary Noise Environments
Speech Enhancement for Nonstationary Noise EnvironmentsSpeech Enhancement for Nonstationary Noise Environments
Speech Enhancement for Nonstationary Noise Environmentssipij
 
Voice Morphing
Voice MorphingVoice Morphing
Voice MorphingSayyed Z
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET Journal
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features ijsc
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 

Semelhante a Linear predictive coding documentation (20)

Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
 
G010424248
G010424248G010424248
G010424248
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
H0814247
H0814247H0814247
H0814247
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Lpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlabLpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlab
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
F010334548
F010334548F010334548
F010334548
 
Speech Enhancement for Nonstationary Noise Environments
Speech Enhancement for Nonstationary Noise EnvironmentsSpeech Enhancement for Nonstationary Noise Environments
Speech Enhancement for Nonstationary Noise Environments
 
50120140501002
5012014050100250120140501002
50120140501002
 
Voice Morphing
Voice MorphingVoice Morphing
Voice Morphing
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 

Mais de chakravarthy Gopi

400 puzzles-and-answers-for-interview
400 puzzles-and-answers-for-interview400 puzzles-and-answers-for-interview
400 puzzles-and-answers-for-interviewchakravarthy Gopi
 
The performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemsThe performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemschakravarthy Gopi
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...chakravarthy Gopi
 
implementation of _phase_lock_loop_system_using_control_system_techniques
implementation of _phase_lock_loop_system_using_control_system_techniquesimplementation of _phase_lock_loop_system_using_control_system_techniques
implementation of _phase_lock_loop_system_using_control_system_techniqueschakravarthy Gopi
 
A REAL TIME PRECRASH VEHICLE DETECTION SYSTEM
A REAL TIME PRECRASH VEHICLE DETECTION SYSTEMA REAL TIME PRECRASH VEHICLE DETECTION SYSTEM
A REAL TIME PRECRASH VEHICLE DETECTION SYSTEMchakravarthy Gopi
 
Comparison of features for musical instrument recognition
Comparison of features for musical instrument recognitionComparison of features for musical instrument recognition
Comparison of features for musical instrument recognitionchakravarthy Gopi
 

Mais de chakravarthy Gopi (12)

14104042 resume
14104042 resume14104042 resume
14104042 resume
 
400 puzzles-and-answers-for-interview
400 puzzles-and-answers-for-interview400 puzzles-and-answers-for-interview
400 puzzles-and-answers-for-interview
 
More puzzles to puzzle you
More puzzles to puzzle youMore puzzles to puzzle you
More puzzles to puzzle you
 
puzzles to puzzle u
 puzzles to puzzle u puzzles to puzzle u
puzzles to puzzle u
 
Mahaprasthanam
MahaprasthanamMahaprasthanam
Mahaprasthanam
 
GATE Ece(2012)
GATE Ece(2012)GATE Ece(2012)
GATE Ece(2012)
 
The performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemsThe performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systems
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
 
implementation of _phase_lock_loop_system_using_control_system_techniques
implementation of _phase_lock_loop_system_using_control_system_techniquesimplementation of _phase_lock_loop_system_using_control_system_techniques
implementation of _phase_lock_loop_system_using_control_system_techniques
 
A REAL TIME PRECRASH VEHICLE DETECTION SYSTEM
A REAL TIME PRECRASH VEHICLE DETECTION SYSTEMA REAL TIME PRECRASH VEHICLE DETECTION SYSTEM
A REAL TIME PRECRASH VEHICLE DETECTION SYSTEM
 
Model resume
Model resumeModel resume
Model resume
 
Comparison of features for musical instrument recognition
Comparison of features for musical instrument recognitionComparison of features for musical instrument recognition
Comparison of features for musical instrument recognition
 

Linear predictive coding documentation

  • 1. CHAPTER1 INTRODUCTION Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. ʊ A vocoder (play /ˈvo koʊ dər/, short for voice encoder) is an analysis/synthesis system, used to reproduce human speech. In the encoder, the input is passed through a multiband filter, each band is passed through an envelope follower, and the control signals from the envelope followers are communicated to the decoder. The decoder applies these (amplitude) control signals to corresponding filters in the synthesizer. Since the control signals change only slowly compared to the original speech waveform, the bandwidth required to transmit speech can be reduced. This allows more speech channels to share a radio circuit or submarine cable. By encoding the control signals, voice transmission can be secured against interception. The vocoder was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission. Transmitting the parameters of a speech model instead of a digitized representation of the speech waveform saves bandwidth in the communication channel; the parameters of the model change relatively slowly, compared to the changes in the speech waveform that they describe. Its primary use in this fashion is for secure radio communication, where voice has to be encrypted and then transmitted. The advantage of this method of "encryption" is that no 'signal' is sent, but rather envelopes of the bandpass filters. The receiving unit needs to be set up in the same channel configuration to resynthesize a version of the original signal spectrum. The vocoder as both hardware and software has also been used extensively as an electronic musical instrument. Whereas the vocoder analyzes speech, transforms it into electronically transmitted information, and recreates it, The Voder (from Voice Operating Demonstrator) generates synthesized speech by means of a console with fifteen touch-sensitive keys and a pedal, 1
  • 2. basically consisting of the "second half" of the vocoder, but with manual filter controls, needing a highly trained operator. Since the late 1970s, most non-musical vocoders have been implemented using linear prediction, whereby the target signal's spectral envelope (formant) is estimated by an all-pole IIRfilter. In linear prediction coding, the all-pole filter replaces the bandpass filter bank of its predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and again at the decoder to re-apply the spectral shape of the target speech signal. 1.1 Organization of the project: Chapter 1: Introduction Chapter 2: General theory Chapter 3:Block diagram Description Chapter 4:Software Description Chapter 5:Results and Conclusion 2
  • 3. CHAPTER2 GENERAL THEORY 2.1 Overview LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression. 2.2 LPC coefficient representations LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for definition of coefficients) is undesirable, since they are very 3
  • 4. sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as Log Area Ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations Log area ratios (LAR) LAR can be used to represent reflection coefficients (another form for linear prediction coefficients) for transmission over a channel. While not as efficient as line spectral pairs (LSPs), log area ratios are much simpler to compute. Let be the kth reflection coefficient of a filter, the kth LAR is: Use of Log Area Ratios have now been mostly replaced by Line Spectral Pairs, but older codecs, such as GSM-FR use LARs. Line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear prediction coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g. smaller sensitivity to quantization noise) that make them superior to direct quantization of LPCs. For this reason, LSPs are very useful in speech coding. Mathematical foundation The LP polynomial can be decomposed into: 4
  • 5. whereP(z) corresponds to the vocal tract with the glottis closed and Q(z) with the glottis open.While A(z) has complex roots anywhere within the unit circle (z-transform), P(z) and Q(z) have the very useful property of only having roots on the unit circle, hence P is a palindromic polynomial and Q an antipalindromic polynomial. So to find them we take a test point and evaluate and using a grid of points between 0 and pi. The zeros (roots) of P(z) and Q(z) also happen to be interspersed which is why we swap coefficients as we find roots. So the process of finding the LSP frequencies is basically finding the roots of two polynomials of order p + 1. The roots of P(z) and Q(z) occur in symmetrical pairs at ±w, hence the name Line Spectrum Pairs (LSPs). Because all the roots are complex and two roots are found at 0 and , only p/2 roots need to be found for each polynomial. The output of the LSP search thus has p roots, hence the same number of coefficients as the input LPC filter (not counting ). To convert back to LPCs, we need to evaluate by "clocking" an impulse through it N times (order of the filter), yielding the original filter, A(z). Properties Line spectral pairs have several interesting and useful properties. When the roots of P(z) and Q(z) are interleaved, stability of the filter is ensured if and only if the roots are monotonically increasing. Moreover, the closer two roots are, the more resonant the filter is at the corresponding frequency. Because LSPs are not overly sensitive to quantization noise and stability is easily ensured, LSP are widely used for quantizing LPC filters. Line spectral frequencies can be interpolated. Reflection coefficient The reflection coefficient is used in physics and electrical engineering when wave propagation in a medium containing discontinuities is considered. A reflection coefficient describes either the amplitude or the intensity of a reflected wave relative to an incident wave. The reflection coefficient is closely related to the transmission coefficient. 2.3 Pitch Period Estimation Determining if a segment is a voiced or unvoiced sound is not all of the information that is needed by the LPC decoder to accurately reproduce a speech signal. In order to produce an input signal for the LPC filter the decoder also needs another attribute of the current speech 5
  • 6. segment known as the pitch period. The period for any wave, including speech signals, can be defined as the time required for one wave cycle to completely pass a fixed position. For speech signals, the pitch period can be thought of as the period of the vocal cord vibration that occurs during the production of voiced speech. Therefore, the pitch period is only needed for the decoding of voiced segments and is not required for unvoiced segments since they are produced by turbulent air flow not vocal cord vibrations. It is very computationally intensive to determine the pitch period for a given segment of speech. There are several different types of algorithms that could be used. One type of algorithm takes advantage of the fact that the autocorrelation of a period function, Rxx(k), will have a maximum when k is equivalent to the pitch period. These algorithms usually detect a maximum value by checking the autocorrelation value against a threshold value. One problem with algorithms that use autocorrelation is that the validity of their results is susceptible to interference as a result of other resonances in the vocal tract. When interference occurs the algorithm cannot guarantee accurateresults. Another problem with autocorrelation algorithms occurs because voiced speech is not entirely periodic. This means that the maximum will be lower than it should be for a true periodic signal. 2.4 Applications LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel; an early example of this is the US government's Navajo I. LPC synthesis can be used to construct vocoders where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding.A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy. Waveform ROM in some digital sample-based music synthesizers made by Yamaha Corporation may be compressed using the LPC algorithm. LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, and other lossless audio codec‘s. 6
  • 7. 2.4.1 Voice effects in music For musical applications, a source of musical sounds is used as the carrier, instead of extracting the fundamental frequency. For instance, one could use the sound of a synthesizer as the input to the filter bank, a technique that became popular in the 1970s. One of the earliest person who recognized the possibility of Vocoder/Voder on the electronic music may be Werner Meyer-Eppler, a German physicist/experimental acoustician/phoneticist. In 1949, he published a thesis on the electronic music and speech synthesis from the viewpoint of sound synthesis, and in 1951, he joined to the successful proposal of establishment of WDR Cologne Studio for Electronic Music. Siemens Synthesizer (c.1959) at Siemens Studio for Electronic Music was one of the first attempt to divert vocoder to create music One of the first attempt to divert vocoder to create music may be a ―Siemens Synthesizer‖ at Siemens Studio for Electronic Music, developed between 1956-1959. In 1968, Robert Moog developed one of the first solid-state musical vocoder for electronic music studio of University at Buffalo.In 1969, Bruce Haack built a prototype vocoder, named "Farad" after Michael Faraday, and it was featured on his rock album The Electric Lucifer released in the same year. In 1970 Wendy Carlos and Robert Moog built another musical vocoder, a 10-band device inspired by the vocoder designs of Homer Dudley. It was originally called a spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier signal came from a Moog modular synthesizer, and the modulator from a microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially articulated speech. Later improved vocoders use a high-pass filter to let some sibilance through from the microphone; this ruins the device for its original speech-coding application, but it makes the "talking synthesizer" effect much more intelligible. Carlos and Moog's vocoder was featured in several recordings, including the soundtrack to Stanley Kubrick's A Clockwork Orange in which the vocoder sang the vocal part of Beethoven's "Ninth Symphony". Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two sections. "Timesteps" was originally intended as merely an introduction to vocoders for the "timid listener", 7
  • 8. but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos.[citation needed] Kraftwerk's Autobahn (1974) was one of the first successful pop/rock albums to feature vocoder vocals. Another of the early songs to feature a vocoder was "The Raven" on the 1976 album Tales of Mystery and Imagination by progressive rock band The Alan Parsons Project; the vocoder also was used on later albums such as I Robot. Following Alan Parsons' example, vocoders began to appear in pop music in the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light Orchestra used the vocoder in several albums such as Time (featuring the Roland VP- 330 PlusMkI). ELO songs such as "Mr. Blue Sky" and "Sweet Talkin' Woman" both from Out of the Blue (1977) use the vocoder extensively. Featured on the album are the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI or II). 2.4.2 speaker-dependent word recognition device The speaker-dependent word recognition device is implemented using the Motorola DSP56303. First the speaker will train the device by storing 10 different vowel sounds into memory. Then the same speaker can repeat one of the ten words associated with the vowel sound and the device can detect which word was repeated and flag an appropriate output. Vowel Microphone A/D Converter Sound Input Store coefficients Calculate LPC in memory coefficients Fig 2.1: Training the Device Vowel Sound Microphone Input A/D Converter 8
  • 9. Compare coefficients Calculate LPC output with the one in coefficients memory Fig 2.2:Word Recognition 2.5 Modern vocoder implementations Even with the need to record several frequencies, and the additional unvoiced sounds, the compression of the vocoder system is impressive. Standard speech-recording systems capture frequencies from about 500 Hz to 3400 Hz, where most of the frequencies used in speech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate). The sampling resolution is typically at least 12 or more bits per sample resolution (16 is standard), for a final data rate in the range of 96-128 kbit/s. However, a good vocoder can provide a reasonable good simulation of voice with as little as 2.4 kbit/s of data. 'Toll Quality' voice coders, such as ITU G.729, are used in many telephone networks. G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves slightly worse quality at data rates of 5.3 kbit/s and 6.4 kbit/s. Many voice systems use even lower data rates, but below 5 kbit/s voice quality begins to drop rapidly. Several vocoder systems are used in NSA encryption systems: LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in STU-III Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band encryptors such as the KY-57. Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband Digital Terminal FNBDT, NSA's 21st century secure telephone. Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32 kbit/s used in STE secure telephone(ADPCM is not a proper vocoder but rather a waveform codec. ITU has gathered G.721 along with some other ADPCM codec‘s into G.726.) 9
  • 10. Vocoders are also currently used in developing psychophysics, linguistics, computational neuroscience and cochlear implant research. Modern vocoders that are used in communication equipment and in voice storage devices today are based on the following algorithms:  Algebraic code-excited linear prediction (ACELP 4.7 kbit/s – 24 kbit/s)  Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s)  Multi-band excitation (AMBE 2000 bit/s – 9600 bit/s)  Sinusoidal-Pulsed Representation (SPR 300 bit/s – 4800 bit/s)  Tri-Wave Excited Linear Prediction (TWELP 600 bit/s – 9600 bit/s) CHAPTER 3 BLOCK DIAGRAM 10
  • 11. 3.1 Block diagram description: General block diagram (as shown in Fig 3.2) of LPC consists of the blocks  A/D Converter  End Point Detection  Pre-emphasis filter  Frame blocking  Hamming window  Auto-Correlation  Levinson-Durbin algorithm Fig 3.1: LPC analysis and synthesis of speech A/D Converter End Point Detection Pre-emphasis filter 11
  • 12. Auto-Correlation Frame Blocking Hamming Window Levinson-Durbin Algorithm SSD Comparison Output Fig 3.2: General Block Diagram 3.2.1 A/D Converter For Motorola DSP56303, the device converts the analog signals to digital samples by an ASM file called ‗core302.asm‘. The samples are input from CODEC A/D input port as shown in Fig. 5. The assembly file initializes the necessary peripheral settings for general I/O purposes. Moreover, the file also contains a macro called waitdata. The macro waits for a sample and takes the sample in. The sampling rate is set to 8000 samples/second. 3.2.2End Point Detection In the end point detection, each sample taken from A/D converter is compared to a volume threshold. If the sample is lower than the threshold, it is considered as background noise and therefore disregarded. Otherwise, the DSP board, will output 4 bits high to Port B to indicate readiness to process speech samples, and the next 2000 samples will be stored into a buffer before processing. 3.2.3Pre-emphasis filter The pre-emphasis is a low-order digital filter. The filter has transfer function show in Equation (3.1). H(z) = 1 – 0.9375 z-1 (3.1) The digitized speech signal goes through the filter to average transmission conditions, noise backgrounds, and signal spectrums. The filter boosts up the high frequency components of human voice and attenuates the low frequency component of human voice. Because human voice typically has higher power at low frequencies, the filter renders the speech sample easy for LPC calculation. 3.2.4 Frame blocking The pre-emphasized speech samples are divided into 30-ms window frames. Each 30 ms window frame consists of 240 samples as illustrated in Equation (3.2) and Equation (3.3). (Sampling Rate)(Frame Length) = Number of Samples in a Frame (3.2) (8000 samples/second)(0.030 second) = 240 samples (3.3) 12
  • 13. In addition, adjacent window frames are separated by 80 samples (240 x 1/3), with 160 overlapping samples. The amount of separation and overlapping depends on frame length. The frame length is chosen according to the sampling rate. The higher the sampling rate, the larger the frame length to be accurate. 3.2.5 Hamming window The windowing method involves multiplying the ideal impulse response with a window function to generate a corresponding filter, which tapers the ideal impulse response. Like the frequency sampling method, the windowing method produces a filter whose frequency response approximates a desired frequency response. The windowing method, however, tends to produce better results than the frequency sampling method. The toolbox provides two functions for window-based filter design, fwind1 and fwind2. fwind1 designs a two-dimensional filter by using a two-dimensional window that it creates from one or two one-dimensional windows that you specify. fwind2 designs a two- dimensional filter by using a specified two-dimensional window directly. fwind1 supports two different methods for making the two-dimensional windows it uses: Transforming a single one-dimensional window to create a two-dimensional window that is nearly circularly symmetric, by using a process similar to rotation Creating a rectangular, separable window from two one-dimensional windows, by computing their outer product The example below uses fwind1 to create an 11-by-11 filter from the desired frequency response Hd. The example uses the Signal Processing Toolbox hamming function to create a one-dimensional window, which fwind1 then extends to a two-dimensional window. Hd = zeros(11,11); Hd(4:8,4:8) = 1; [f1,f2] = freqspace(11,'meshgrid'); mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64)) h = fwind1(Hd,hamming(11)); figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2]) 13
  • 14. Below images shows Desired Two-Dimensional Frequency Response (left) and Actual Two- Dimensional Frequency Response (right) Fig3.4:Two-Dimensional Frequency Response Creating the Desired Frequency Response Matrix The filter design functions fsamp2, fwind1, and fwind2 all create filters based on a desired frequency response magnitude matrix. Frequency response is a mathematical function describing the gain of a filter in response to different input frequencies. 3.2.6 Auto-Correlation The Autocorrelation LPC block determines the coefficients of an N-step forward linear predictor for the time-series in each length-M input channel, u, by minimizing the prediction error in the least squares sense. A linear predictor is an FIR filter that predicts the next value in a sequence from the present and past inputs. This technique has applications in filter design, speech coding, spectral analysis, and system identification. The Autocorrelation LPC block can output the prediction error for each channel as polynomial coefficients, reflection coefficients, or both. It can also output the prediction error power for each channel. The input u can be a scalar, unoriented vector, column vector, sample-based row vector, or a matrix. Frame-based row vectors are not valid inputs. The block treats all M-by-N matrix inputs as N channels of length M. When you select Inherit prediction order from input dimensions, the prediction order, N, is inherited from the input dimensions. Otherwise, you can use the Prediction 14
  • 15. orderparameter to specify the value of N. Note that N must be a scalar with a value less than the length of the input channels or the block produces an error. When Output(s) is set to A, port A is enabled. For each channel, port A outputs an (N+1)-by-1 column vector, a = [1 a2a3 ... aN+1]T, containing the coefficients of an Nth-order moving average (MA) linear process that predicts the next value, ûM+1, in the input time- series. When Output(s) is set to K, port K is enabled. For each channel, port K outputs a length-N column vector whose elements are the prediction error reflection coefficients. When Output(s)is set to A and K, both port A and K are enabled, and each port outputs its respective set of prediction coefficients for each channel. When you select Output prediction error power (P), port P is enabled. The prediction error power is output at port P as a vector whose length is the number of input channels. 3.2.7Levinson-Durbin algorithm Ra = bin the cases where: R is a Hermitian, positive-definite, Toeplitz matrix. b is identical to the first column of R shifted by one element and with the opposite sign. The input to the block, r = [r(1) r(2) ... r(n+1)], can be a vector or a matrix. If the input is a matrix, the block treats each column as an independent channel and solves it separately. Each channel of the input contains lags 0 through n of an autocorrelation sequence, which appear in the matrix R. 15
  • 16. The block can output the polynomial coefficients, A, the reflection coefficients, K, and the prediction error power, P, in various combinations. The Output(s) parameter allows you to enable the A and K outputs by selecting one of the following settings: A — For each channel, port A outputs A=[1 a(2) a(3) ... a(n+1)], the solution to the Levinson-Durbin equation. A has the same dimension as the input. You can also view the elements of each output channel as the coefficients of an nth-order autoregressive (AR) process. K — For each channel, port K outputs K=[k(1) k(2) ... k(n)], which contains n reflection coefficients and has the same dimension as the input, less one element. A scalar input channel causes an error when you select K. You can use reflection coefficients to realize a lattice representation of the AR process described later in this page. A and K — The block outputs both representations at their respective ports. A scalar input channel causes an error when you select A and K. Select the Output prediction error power (P) check box to output the prediction error power for each channel, P. For each channel, P represents the power of the output of an FIR filter with taps A and input autocorrelation described by r, where A represents a prediction error filter and r is the input to the block. In this case, A is a whitening filter. P has one element per input channel. When you select the If the value of lag 0 is zero, A=[1 zeros], K=[zeros], P=0 check box (default), an input channel whose r(1) element is zero generates a zero-valued output. When you clear this check box, an input with r(1) = 0 generates NaNs in the output. In general, an input with r(1) = 0 is invalid because it does not construct a positive-definite matrix R. Often, however, blocks receive zero-valued inputs at the start of a simulation. The check box allows you to avoid propagating NaNs during this period. Applications One application of the Levinson-Durbin formulation implemented by this block is in the Yule-Walker AR problem, which concerns modeling an unknown system as an autoregressive process. You would model such a process as the output of an all-pole IIR filter with white Gaussian noise input. In the Yule-Walker problem, the use of the signal's 16
  • 17. autocorrelation sequence to obtain an optimal estimate leads to anRa = b equation of the type shown above, which is most efficiently solved by Levinson-Durbin recursion. In this case, the input to the block represents the autocorrelation sequence, with r(1) being the zero-lag value. The output at the block's A port then contains the coefficients of the autoregressive process that optimally models the system. The coefficients are ordered in descending powers of z, and the AR process is minimum phase. The prediction error, G, defines the gain for the unknown system, where : The output at the block's K port contains the corresponding reflection coefficients, [k(1) k(2) ... k(n)], for the lattice realization of this IIR filter. The Yule-Walker AR Estimator block implements this autocorrelation-based method for AR model estimation, while the Yule-Walker Method block extends the method to spectral estimation. Another common application of the Levinson-Durbin algorithm is in linear predictive coding, which is concerned with finding the coefficients of a moving average (MA) process (or FIR filter) that predicts the next value of a signal from the current signal sample and a finite number of past samples. In this case, the input to the block represents the signal's autocorrelation sequence, with r(1) being the zero-lag value, and the output at the block's A port contains the coefficients of the predictive MA process (in descending powers of z). These coefficients solve the following optimization problem: Again, the output at the block's K port contains the corresponding reflection coefficients, [k(1) k(2) ... k(n)], for the lattice realization of this FIR filter. The 17
  • 18. Autocorrelation LPC block in the Linear Prediction library implements this autocorrelation- based prediction method. 3.2.8 Sum of Square of Difference comparison The Sum of Square of Difference comparison is quantitative method to compare two sets of LPC coefficients. Suppose one set of LPC coefficients in the template are A‘1, A‘2, A‘3, …., A‘10, and another set of LPC coefficients obtained from a window frame are A1, A2, A3, …., A‘10. SSD = (A‘1 – A1)2 + (A‘2 – A2)2 + (A‘3 – A3)2 + …. + (A‘10 – A10)2 Each time the window frame is shifted, SSD is calculated between LPC coefficients from the window frame and every set of LPC coefficients in template. A minimum SSD exists between LPC coefficients from a window frame and one set of LPC coefficients in template. The one with the minimum SSD value is the closest match to the input vowel. 18
  • 19. CHAPTER 4 SOFTWARE DESCRIPTION 4.1 MATLAB INTRODUCTION: The name MATLAB stands for MATrix LABoratory. MATLAB was written originallyto provide easy access to matrix software developed by the LINPACK (linear system package) and EISPACK (Eigen system package) projects. MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming environment. Furthermore, MATLAB is a modern programming language environment: it has sophisticated data structures, contains built-in editing and debugging tools, and supports object-oriented programming. These factors make MATLAB an excellent tool for teaching and research. MATLAB has many advantages compared to conventional computer languages (e.g.,C, FORTRAN) for solving technical problems. MATLAB is an interactive system whose basic data element is an arraythat does not require dimensioning. The software package has been commercially available since 1984 and is now considered as a standard tool at most universities and industries worldwide. It has powerful built-inroutines that enable a very wide variety of computations. It also has easy to use graphics commands that make the visualization of results immediately available. Special c applications are collected in packages referred to astoolbox. There are toolboxes for signal processing, symbolic computation, control theory, simulation, optimization, and several other fields of applied science and engineering. 4.2Mathematical functions: MATLAB offers many predefined mathematical functions for technical computing whichcontains a large set of mathematical functions.Typing help elfun and help specfun calls up full lists of elementaryand special functions respectively.There is a long list of mathematical functions that are builtinto MATLAB. Thesefunctions are called built-ins. Many standard mathematical functions, such as sin(x), cos(x),tan(x), ex, ln(x), are evaluated by the functions sin, cos, tan, exp, and log respectively in MATLAB. 4.3 Basic plotting: MATLAB has an excellent set of graphic tools. Plotting a given data set or the results of computation is possible with very few commands. We are highly encouraged to plot 19
  • 20. mathematical functions and results of analysis as often as possible. Trying to understand mathematical equations with graphics is an enjoyable and very efficient way of learning mathematics. 4.4 Matrix generation: Matrices are the basic elements of the MATLAB environment. A matrix is a two- dimensionalarray consisting of mrows and ncolumns. Special cases are column vectors (n= 1) and rowvectors(m= 1). MATLAB supports two types of operations, known as matrix operationsand array operations. MATLAB provides functions that generate elementary matrices. The matrix of zeros, thematrix of ones, and the identity matrix are returned by the functions zeros, ones, and eye,respectively. table 3.1:Elementary matrices 4.5 Programming in Matlab: 4.5.1 M-File scripts: A script fileis an external file that contains a sequence of MATLAB statements. Script files have a filename extension .m and are often called M-files. M-files can be scripts that simply execute a series of MATLAB statements, or they can be functionsthat can accept arguments and can produce one or more outputs. 4.5.2 Script side-effects: All variables created in a script file are added to the workspace. This may have undesirable effects, because: Variables already existing in the workspace may be overwritten. The execution of the script can be affected by the state variables in the workspace. As a result, because scripts have some undesirable side-effects, it is better to code any complicated applications using rather function M-file. 4.5.3 Input to script Files: 20
  • 21. When a script file is executed, the variables that are used in the calculations within the file must have assigned values. The assignment of a value to a variable can be done in three ways. 1. The variable is defined in the script file. 2. The variable is defined in the command prompt. 3. The variable is entered when the script is executed. 4.5.4 Output Commands: MATLAB automatically generates a displaywhen commands are executed. In addition to this automatic display, MATLAB has several commands that can be used to generate displays or outputs. Two commands that are frequently used to generate output are: disp and fprintf. Table for disp and fprint commands 4.5.5 Saving output to a File: In addition to displaying output on the screen, the command fprintf can be used for writing output to a file. The saved data can subsequently be used by MATLAB or other software‘s.To save the results of some computation to a file in a text format requires the followingsteps: 1. Open a file using fopen 2. Write the output using fprintf 3. Close the file using fclose 4.6 Debugging M-Files: 4.6.1 Introduction: This section introduces general techniques for finding errors in M-files. Debugging is theprocess by which you isolate and fix errors in your program or code. Debugging helps to correct two kind of errors: Syntax errors - For example omitting a parenthesis or misspelling a function name. Run-time errors - Run-time errors are usually apparent and difficult to track down.They produce unexpected results. 4.6.2 Debugging process: 21
  • 22. We can debug the M-files using the Editor/Debugger as well as using debugging functions from the Command Window. The debugging process consists of Preparing for debugging: MATLAB is relatively easy to learn Setting breakpoints Running an M-file with breakpoints Stepping through an M-file Examining values Correcting problems Ending debugging 4.7 Strengths: MATLAB may behave as a calculatoror as a programming language MATLAB combine nicely calculation and graphic plotting MATLAB is interpreted (not compiled ),errors are easy to fix MATLAB is optimized to be relatively fast when performing matrix operations 4.8 Weaknesses: MATLAB is not a general purpose programming language such as C, C++, or FORTRAN. MATLAB is designed for scientific computing, and is not well suitable for other applications. MATLAB is an interpreted language, slower than a compiled language such as C++ MATLAB commands are specific for MATLAB usage. Most of them do not have a direct equivalent with other programming language commands. CHAPTER 5 RESULTS AND CONCLUSIONS 22
  • 23. 5.1 Result The implementation of a LPC vocoder is really an exciting and challenging matter. A lot of techniques were learned from the literature and practice during this work. Looking to the complexity of the voiced / unvoiced decision in the LPC-10e DoD vocoder, it is clear that a good algorithm must have a lot of intelligence and adaptability in order to get good results. The main problem is the estimation of the pitch. Secondly, a robust voiced / unvoiced decision is very important. Fig 5.1:Screen shot of input 23
  • 24. Fig 5.2: Screen shot of output It was found that considering the memory of the LPC filter leads to better results. The median filter was not able to give a smooth pitch contour. Some techniques like avoiding abrupt changes in the pitch value and avoiding double and half pitches should be incorporated in order to get better results. 24
  • 25. REFERENCES Partha S Malik, MATLAB and SIMULINK 3rd edition Stephen J Chapman, MATLAB programming for engineers 2nd edition www.wikipedia.org www.mathworks.com 25