SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
268                                                                                                PROCEEDINGS OF THE IEEE, VOL. 61, NO. 3, MARCH         1973

 [2] C. Mfiller, The TkoryofRelatioity. London, England: Oxford Univ.                    accelerated system of reference,” Phys. Reu., vol. 134,pp.A799-
     Press, 1952, pp. 274-276, 302-309.                                                  A804, 1964.
 [3] E. J. Post, Formal Structure of Elecfromagnetics. Amsterdam, The              [9]   T. C. Mo, “Theoryof electrodynamics in media in noninertial frames
     Netherlands: North-Holland, 1962, chs. 3, 6, 7.                                     and applications,” J. Math. Phys., vol. 11, pp. 2589-2610, 1970.
 [4] A. Sommerfeld, Elektrodynamik. Wiesbaden, Germany:           Die-            [lo]   R. M. Fano, L. J. Chu, and R. B. Adler, Electromagnetk Fields, En-
     terich’sche Verlag, 1948, pp. 281-289.                                              ergy and Forces. New York: Wiley, 1960.pp. 407-416.
 [SI T.Schlomka, “Das Ohmsche Gesetz beibewegtenKorpern,” A n n .                 [ll]   L.Robin, Fonctions Sphiriques de Legendre et Fonctions Sphtroidulcs,
     Phys., vol. 6, no. 8, pp. 246-252, 1951.                                            vol. 3.Paris,  France: Gauthier-Villars,1959,pp. 94-10.
 [6] H. Epheser and T. Schlomka,        “Flachengrossen und elektrodyn-           [12]   W. R. Symthe, Static and Dynamic Elec#ricity. New York: Mc-
     amische  Grenzbedingungen  bei       bewegten Korpern,” Ann. Phys.,                 Graw-Hill, 1950, p. 208.
     vol. 6, no. 8, pp. 211-220, 1951.                                            [13]   M. Faraday, Experimental Researches in   Electricity, vol. 3. New
 [7] J. L. Anderson and J. W. .Ryon,“Electromagneticradiation          in                York:Dover, 1965, pp. 362-368.
     accelerated systems,” Phys. Rev., vol. 181, pp. 1765-1775, 1969.             [14]   E . G.Cullwick, Electromagnetism and Relativity.   London, England:
 [8] C. V. Heer, “Resonant frequencies of an electromagnetic cavity in an                Longmans, pp. 36-141, 167.




                                                   The Viterbi Algorithm
                                                            G. DAVID FORNEY, JR.

                                                                        Invited Paper



    Abstrucf-The Viterbi algorithm (VA) is a recursive optimal  solu-                           11. STATEMENT  OF THE PROBLEM
tion to the problem of estimating the state sequence of a discrete-
time finite-stateMarkovprocess      observed in memoryless noise.                                                VA
                                                                                      In its most general form, the may be viewed as a solu-
Many problems i areas such as digital communicationscan be cast
                  n                                                               tion to theproblem of maximum a posteriori probability
in this form. T i paper gives a tutorial exposition of the algorithm
               hs                                                                 (MAP) estimation of the state sequence of a finite-state dis-
and of how it is implemented and analyzed. Applications to date are     crete-time Markov process observed in memoryless noise. I n
reviewed. Increasing use of the algorithm in a widening variety of
areas i foreseen.
       s                                                                this section we set up the problem in this generality, and then
                                                                        illustrate by example the different sorts problems that can
                                                                                                                               of
                            I. INTRODUCTION                             be made to such a model. The general approach
                                                                                           fit                                               also has the
           HE VITERBI algorithm (VA) was proposed in 1967               virtue of tutorial simplicity.
                                                                               TheunderlyingMarkovprocessischaracterized                            as fol-
        PI     as a method of decoding convolutional codes. Since
           that time, it has been recognized an attractive solu- lows. Time is discrete. The statex k a t time k is one of a finite
                                               as
tion to a variety of digital estimation problems, somewhat as number M of states m , 1 s m 5 14;i.e., the state space X is
the Kalman filter has been adapteda to        variety of analog esti- simply { 1, 2, . . , A I ] , Initiallyweshallassumethatthe
mation problems. Like the Kalman filter, the           VA tracks the process runs only from time 0 t o t i m e K and that the initial
s t a t e of a stochastic process with a recursive method that is       a n d final states x0 a n d XK are known; the state sequence is
optimum in a certain sense, and that lends           itself readily to then represented by a finite vector X = (20,                     +  , x ~ ) We see
                                                                                                                                            1      .
implementation and analysis. However, the underlying               pro- later that extension to infinite sequences is trivial.
cess is assumed to be finite-state Markov rather than Gaus-                    T h e process is Markov, in the sense that the probability
sian, which leads to marked differences in structure.                   P ( X k + l I x 9 x1, . * . , xk) of being in state
                                                                                       0                                          X k + l at time k f 1,

       This paper is intended to be principally a tutorial intro-       given all states up to time k , depends only on t h e s t a t e xk at
duction to theVA, its structure, and its analysis. It also pur-time k :
ports to review more or       less exhaustively all work inspired by
                                                                                         P ( X k + l j xo, x1, . . . , Xk) = P(Xk+lI Xk).
o r related to the algorithm up to the time writing (summer
                                                 of
1972). Our belief is that the algorithmwill find application in The transition probabilities P ( x k + l ~ k ) may be time varying,
                                                                                                                           x
a n increasing diversity of areas. Our hope is t h a t we can ac- but we do not explicitly indicate this in the notation.
celerate this process for the readers this paper.
                                         of                                    I t is convenient to define the transition & at time k as t h e
                                                                        pair of states (xk+l, x k ) :
   This invited paper is one of a series planned on topics of general interest-
                                                                                                               tk     (%+I,   xk).
The Editor.
   Manuscript received September 20, 1972;revised November 27, 1972.
   The author is with Codex Corporation, Newton, Mass. 02195.                     We let E bethe(possiblytime-varying)set                       of transitions
FORNEY: THE VITERBI ALGORITFIM                                                                                                                 269


                                                                              “                   h                    a   SHIFT REGISTER


                                                                                                                 DETERMINISTIC
                                                                                                                             FUNCTION
                                                                                                                                             t.-
                     Fig. 1.   Most general model.

                                                                                                                                 MEWRYLESS
                                                                                                                                  CHANNEL
& = (%&I,   xk)   for which P(xe11 x k ) ZO,and I X I their number.
Clearly I X 1 5 M2.    There is evidently a one-to-one correspon-
dencebetweenstatesequences             X andtransitionsequences                                 Fig. 2.   Shift-register
                                                                                                                       model.
f = (50, * * , &-I). (We write x= t.)
    Theprocess is assumedtobeobservedinmemoryless
noise; t h a t is, there is a sequence z of observations z k in which
                                                                                                           I
                                                                                                                           ’Ik       ,
Zk depends probabilistically only on the transition        E at time
k:l


                                         k-0

In the parlance of information theory, z can be described as
the output of some memoryless channel whose input sequence
is(seeFig.         1). Again,though weshallnotindicateit       ex-
plicitly, the channel may be time varying in the sense that                                   Fig. 3. A convolutionalencoder.
P ( z k I b ) may be a function of k . This formulation subsumes
t h e following as special cases.
                                                                                   Finally, we state the problem to which the VA is a solu-
      1) The case in which z k depends only on the state x k :
                                                                               tion. Given a sequence         z of observations of a discrete-time
                                                                               finite-stateMarkov         process memoryless
                                                                                                                  in           noise,
                                                                                                                                    find           the
                                                                               state sequence x for which the a posteriori probability P(xl I)
                                                                               is maximum. Alternately, find the transition sequence             F, for
     2) The case in which          4 depends probabilistically on an           which P(F,j z) is maximum (since x 2 E ) . In the shift-register
output y k of the process at time k , where y k is in turn a de- model this is also the same as finding the most probable input
terministic function of the transition b or the state ?&.                      sequence u, sincet&.        X ; or also the most probable signal se-
   Example: T h e following model frequently arises in digital                            yL1
                                                                               quence y, if       X . I t is well known that this MAP rule mini-
communications. There is an input sequence = (UO,          u       ul, * . ), mizes the error probability in detecting the whole sequence
where each u k is generated independently according to some                    (the block-, message-, or word-error probability), and thus is
probability distribution P ( i ( k ) and can take on one of a finite           optimum in this sense. We shall see that in many applications
number of values, say m. There is a noise-free signal sequence it is effectively optimum in any sense.
y, not observable, in which eachy k is some deterministic func-                   Application Examples: We now give examples showingt h a t
tion of the present and thev previous inputs:                                  the problem statement above applies to a number of diverse
                                                -
                         y k = f ( U k , * ’ * 7 Uk-v)                         fields,includingconvolutionalcoding,intersymbolinterfer-
                                                                               ence,continuous-phase     frequency-shift   keying(FSK),and
The observed sequence z is the output of a memoryless chan- text recognition. The adequately motivated reader may skip
nel whose input is y. We call such a process a shift-register immediately to the next section.
process, since (as illustrated in Fig. 2) it can be modeled by a
shift register of length Y with inputs u k . (Alternately, it is a A . Convolutional Codes
vth-orderm-aryMarkovprocess.)                    T o completethecorre-
                                                                                   A rate-l/n binary convolutional encoder is a shift-register
spondence to our general model we define:
                                                                               circuit exactly like that of Fig. 2, where the inputs u k are in-
     1) the state
                                                                               formationbitsandtheoutputs               y k areblocks    of # bits, y k
                       x k & (uk-1, ‘        ‘ 3 uk-v)                                -
                                                                                =(Pa, * * , p,,~),each of which is a parity check on (modulo-
                                                                               2 sum of) some     subset        of the v + l information   bits    (Uk,
     2) the transition                                                         Uk-1, * . , Uk-v). When the encoded sequence (codeword)             y is
                       b & ( u k , * * ’ Uk--.).                               sentthroughamemorylesschannel,                 we have precisely the
                                                                               model of Fig. 2. Fig. 3 shows a particular rate-$ code with
The number of states is thus 1 X =m’, and of transitions, v = 2 . (This codeis the only one ever used for illustration in the
                                             I
 IEl =m’+l. If theinputsequence“starts”                      at time 0 and VA coding literature, but the reader must not infer that it is
‘stops” at time K - v , i.e.,                                                  the only one the VA can handle.)
         u = ( . . . 0, u0, u 1 , ’ ’ , ~ K - P 0, 0, ‘ * ’ )
                                                        ,                           More general convolutional encoders exist: the rate may
                                                                               be k / n , the inputs may be nonbinary, and the encoder may
then the shift-register process effectively starts at time and even contain feedback. In every case, however, the code may
                                                                       0
ends at time K with x0 = XK = (0, 0,                 , 0).                     be taken to be generated by a shift-register process [2].
                                                                                   We might also note that other types of transmission codes
                                                                               (e.g., dc-free codes, run-length-limited codes, and others) can
    1 The notation is appropriate when observations are discrete-valued;
for continuous-valued Zk, simply substitute a density P ( z t I&) for the dis- be modeled as outputs of a finite-state machine and hence           fall
tribution P ( z t I&).                                                         into our general setup [49].
270                                                                                                            PROCEEDINGS OF THE IEEE, MARCH             1973

B . IntersymbolInterference
    In digital transmission through analog channels, we fre-
quently encounter following
                    the        situation. input
                                        The     se-
                                                                                 J i -
                                                                                  EJ
                                                                                   - E
                                                                                                                                           Yh   +
quence u, discrete-time and discrete-valued as in the shift-
register model, is used modulate some continuous waveform
                          to                                                    h0
which is transmitted through      a channel and then sampled.
                                                                                      Fig. 4.    Model of PAM system subject to intersymbol
Ideally, samples z k would equal the correspondingU k , or some                                 interference and white Gaussian noise.
simple function thereof; in fact, however, the samples ,zk a r e
perturbed both by noise and by neighboring inputs U p . T h e
lattereffectiscalledintersymbolinterference.Sometimes
intersymbolinterferenceisintroduceddeliberatelyforpur-
poses of spectralshaping,inso-calledpartial-responsesys-
tems.
    In such cases the output samples can often modeled as
                              zk    =   yk   +   nk
                                                      be
                                                                           L9-jw-g
                                                                           "k




                                                                          Fig. 5.
                                                                                                                         SIGNAL




                                                                                      Model for binarycontinuous-phase FSK withdeviationratio
                                                                                                                                                    +
                                                                                                                                                    "lk


                                                                                        3 and coherent detection in white Gaussian noise.
where y k is a deterministic function of a finite number of in-
puts, say, y k = f ( U k , * * . , Uk-y), a n d n k is a white Gaussian
noise sequence. This is precisely Fig. 2.                                 input sequenceu be binary and let ( 0 ) a n d w(1) be chosen so
                                                                                                                w
    To be still more specific, in pulse-amplitude modulation              t h a t w ( 0 ) goesthroughanintegernumber      of cyclesin T
(PAM) the signal sequence y may be taken as the convolu-                  seconds and w(1) through an odd half-integer number;       i.e.,
tion of the input sequenceu with some discrete-time channel               w(O)T=Oandw(l)T=7rmodulo27r. Thenif e o = O , O1=Oorr,
impulse-responsesequence (ho,hl, * * ) :                                  accordingtowhether         u equalszero or one,andsimilarly
                                                                                                      o
                                                                          & = O or r , according to whether an even or odd number      of
                         yk    =             hi24k-i.
                                                                          ones has been transmitted.
                                        i
                                                                                Here we have a two-state process, with X = (0, 7r 1. T h e
If h, = O for i>v (finite impulse response), then we obtain our           transmitted signal y k is a function of both the current input
shift-register model. An illustration of such a model in which            u k and the state x k :
intersymbol interference spans three time units
pears in Fig. 4.
                                                    ( v = 2) ap-
                                                                          yk    =    cos   [W(Uk)t   +   Xk]   =   cos   cos w(Uk)f,
    It was shown in [29] that even problems where time is                                                                      KT      t   < ( R + 1)T.
actually continuous-Le., the received signal r(t) has the form
                                                                          Since transitions & = ( x k + l , % are one-to-one functions t h e
                                                                                                             $                       of
                          K
                                             - KT) + n(1)
                                                                          current state x k and input U k , we may alternately regard y k as
                r(t) =             Ukh(1
                                                                          being determined by ( k .
                         k=O
                                                                                If we takeqo(t) &os w(0)t a n d vl(t) &os w(1)t as bases of
for some impulse responseh ( t ) ,signaling interval T , and reali-       t h e signal space, we may write
zation n(t) of a white Gaussian noise process-can be reduced
without loss of optimality to the aforementioned discrete-time                       y k = yOk'?O(l)       ylkvl(t)      +
form (via a "whitened matched filter").                             where the coordinates ( y o , ylk) are given by

C. Continuous-Phase FSK
    This example is cited not for its practical importance, but
because, first, it leads to a simple model we shall later use in
an example, and, second, it shows how the            VA may lead to                                   I
fresh insight even in the most traditional situations.                                                ( ( 0 ,- I ) ,     if U k = 1 , Xk =
    In FSK, a digital input sequence u selects one of m fre-
                                                                           Finally, if the received signal ( ( t ) is q(t) plus white Gaussian
quencies (if z& is m-ary) in each signaling interval length T ;
                                                          of
                                                                           noise v ( t ) , then by correlating the received signal against both
that is, the transmitted signal q(t) is
                                                                           q o ( t ) and ql(t) in each signal interval (coherent detection), we
                                                                           may arrive withoutloss of information a t a discrete-time out-
                                                                           put signal
where O ( U k ) is the frequency selected by      U k , and e k is some
phase angle. It is desirable for reasons both of spectral shap-                                                              +
                                                                                            z k = (ZW, Z l k ) = (yak, y l k )   (nokt flu)
ing and of modulator simplicity that the phase be continuous
                                                                           where no a n d n1 are independent equal-variance white Gaus-
at the transition interval; thatis, t h a t
                                                                           sian noise sequences. This model appears in Fig. 5, where the
       w(Uk-1)kT    +    ek-1 E w ( u k ) k T       +
                                              e k modulo 27r.              signal generator generates( y ~y a ) according to the aforemen-
                                                                           tioned rules.
                                                                                                                      ,

This is called continuous-phase FSK.
    The continuity of the phase introduces memory into the                 D . TextRecognition
modulation process; i.e., it makes the signal actually trans-                                                                       VA is
                                                                                  Ve include this example to show that the not limited
mitted in the kth interval dependent on previous signals. T o to             digital  communication. optical-character-recognition
                                                                                                        In
take the simplest possible case ("deviation ratio"          =a), let t h e (OCR) readers,        individual   characters scanned,
                                                                                                                        are         salient
FORNEY: THE VITERBI ALGORITHM


        MARKOV
       PROCESS    CHARACTERS     OCR     OCR OUTPU
     REPRESENTING
       ENGLISH
                      yI   *    DEVICE       zI


                 Fig. 6. Use of VA to improve character
                    recognition by exploiting context.

                                             as
features isolated, and some decision made to what letter or
other character lies below the reader. When the characters
actually occur as part of natural-language text, it has long
been recognized that contextual information can be used to
assist the reader in resolving ambiguities.
    One way of modeling contextual constraints is to treat a
natural language like English     as though it were a discrete-
time Markov process. For instance, we can suppose that the
probability of occurrence of eachletterdependsonthe              v
                                                                        01

                                                                        IO

                                                                        11
                                                                                                              2K:::w
previousletters,andestimatetheseprobabilitiesfromthe
                                                                                                         (b)
frequencies of (vf1)-letter  combinations[(v+l)-grams].
Whilesuchmodelsdonotfullydescribethegeneration                  of Fig. 7. (a) State diagram of a four-state shift-register process. (b) Trellis
                                                                                     f o r a four-state shift-register process.
natural language(for examples of digram and trigram English,
see Shannon [45], [46]), they account for a large part of t h e
statistical
          dependencies language are
                       in
                        the      and easily                        instant of time. The trellis begins and endsthe knownstates
                                                                                                                          at
handled.                                                           x. and XK. Its most important property is that every possi-  to
    With such a model, English letters are viewed as the out- ble.state sequence x there corresponds a unique path through
puts of a n m’-state Markov process, wherem is the number       of the trellis, and vice versa.
distinguishable characters, such    as 27 (the 26 letters and a         Now we show how, given              a sequence of observations I,
space). If it is further assumed that the OCR output         zk is every path may be assigned             a “length” proportional to -In
dependent only on the corresponding input character then P ( x , z), where x is the state sequence associated with that
                                                         yk,
the OCR reader isa memoryless channel to whose output se- path. This will allow us t o solve the problem of finding t h e
 quencewemayapplytheVA               to exploitcontextualcon-      state sequence for whichP(xl z) is maximum, or equivalently
straints; see Fig. 6. Here the ‘‘OCR output” may be anything for which P ( x , z) = P ( x i z ) P ( z ) is maximum, by finding the
                                      a
from the raw sensor data, possibly grid of zeros and ones, to path whose length --in P ( x , Z) is minimum, since In P ( x ,t) is
the actual decision which would be made by the reader in the a monotonic function of P ( x , Z) and there isa one-to-one cor-
 absence of contextual clues. Generally, the more raw the data, respondencebetweenpathsandsequences.Wesimply                               ob-
the more useful the   VA will be.                                  servethatduetotheMarkovandmemorylessproperties,
                                                                   P ( x , z) factors as follows:
E . Othw
    Recognizing certain similarities between magnetic record-                     P(x, 2)   =            1
                                                                                                P(x)P(z x )
ing media and intersymbol interference channels, Kobayashi
[SO] has proposed applying the VA       to digital magnetic re-
cording systems. Timor [SI] has applied the algorithm to a
sequentialrangingsystem.Use        of thealgorithminsource               Hence if weassigneachbranch(transition)the                  ‘‘length”
coding has been proposed    [52]. Finally, Preparata and Ray
 [53] have suggested using the algorithm to search ‘‘semantic
maps”insyntacticpatternrecognition.Theseexhaustthe
applications known to the author.                                        then the total length of the path corresponding to some x is

                           111. THEALGORITHM
    We now show that the MAP sequence estimation problem                                                      k=O
previously stated is formally identical to the problem find-
                                                          of
                                                                   as claimed.
ing the shortest route through a certain graph. The VA then
                                                                        Finding the shortest route througha graph is an old prob-
arises as a natural recursive solution.
                                                                   lem in operations research. The most succinct solution was
    Ye areaccustomedtoassociatingwith            a discrete-time
                                                                   givenbyMintyin             a quarter-pagecorrespondencein            1957
finite-state Markov process a state diagramof the type shown
                                                                   [ 5 5 ] , which we quote almost in its entirety:
in Fig. 7(a), for a four-state shift-register process like that of
                                                                        . . . as follows: Build problem .model of the travel network,
Fig. 3 or Fig. 4 (in this case, a de Bruijn diagram [54]). Here The               shortest-route            . . can be solved verysimply
                                                                                                 a string
~K~des  representstates,branchesrepresenttransitions,and               where knots representcities and string lengths represnt dis-
over the course of time the process traces some path from state        tances (or costs). Seize the knot ‘Los Angeles”in your left
to state the diagram.
      throughstate                                                     hand
                                                                          and
                                                                          knot
                                                                           the
                                                                             ’Boston”                  in your
                                                                                                           right
                                                                                                             and          pull them
                                                                                                                                 apart.
272                                                                                                   PROCEEDINGS OF THE IEEE, MARCH     1973

            Unfortunately, the Minty algorithm is not          well adapted
        to modern methods of machine computation, nor are assis-
        tants as pliable as formerly. I t therefore becomes necessary to
        move on to the VA, which is also well known in operations                                               1
                                                                                                                            1
        research [ 5 6 ] . I t requires one additional observation.                                                 1
            We denote by xok a segment (xo, xl, . , xk) consisting of
        the states to time k of the state sequence x = (xo, x 1 , * , x k ) .
        I n t h etrellis xok corresponds to a path segment starting t h e
                                                                      at
        node x0 and terminatingat Xk. For any particular time-k node
        %kt there will in general be several such path segments, each
        with some length
                                                 k-1
                                       UXOk> =
                                                  6 0
                                                        m>.
        The shortest such path segment is called the              surriuor corre-
        sponding to the node x k , and is denoted ? ( X k ) . For any time
        k>O, there are M survivors in all, one for each k . The obser-
                                                                   x
        vation is this: the shortest complete path i must begin with
        one of these survivors. (If it did not, but went through state
                                                                                           7'
        Z k at time k , thenwecouldreplaceitsinitialsegmentby
        f ( x k ) to get a shorter path-contradiction.)
               T h u s at any time k we need remember only the            M sur-
        vivors f(xk) and their lengths l'(xk) p X [ f ( x k ) ] . To get to time
        k f l , we need only extend all time-k survivors by one time
                                        of
        unit, compute the lengths the extended path segments, and
        for each node xk+l select the shortest extended path segment
        terminatingin xk+1 as thecorrespondingtime-(kfl)sur-
        vivor. Recursion proceeds indefinitely without the number of
        survivors ever exceedingM .



                                                                                                                        r3
               Many readers will recognizethisalgorithm              as a simple
                                                                    [%I.
                                                                                          
        version of forward dynamic programming [57],                      By this
        or anyothername,thealgorithmiselementaryoncethe
        problem has been cast in the shortest route form.                                k=5
               We illustrate the algorithm for a simple four-state trellis
        covering 5 timeunitsinFig.           8. Fig. 8(a)showsthecom-
        plete trellis, with each branch labeled with a length. (In a real
                                                                                                                         (b)
        application, the lengths would be functions              of the received
                                                                                        Fig. 8. (a) Trellis labeled with branch lengths; M = 4, K = 5.
        data.)Fig.8(b)showsthe             5 recursivestepsbywhichthe                   (b) Recursive determination of the shortest path via the VA.
        algorithm determines the shortest path from the initial to the
        final node. At each stage only t h e 4 (or fewer) survivors are
        shown, along with their lengths.                                          for each x k + l ; store r ( x r + l ) and the corresponding survivor
               A formal statement of the algorithm follows:                       i(Xk+l).
                                                                                       S e t k to k+1 and repeat untilk = K .
         Viterbi Algorithm                                                             With finite state sequences x the algorithm terminates at
                                                                                  time K with the shortest complete path stored the survivor
                                                                                                                                           as
          Storage:
                                                                                ?(xK).
(time
)         k                                             ;
          ?(xk),   1< x k < & f        (survivor terminating in   xk):              Certain  trivialmodifications necessary practice.
                                                                                                                 are       in
          r(xk),   1<xk <M             (survivor length).                       When state sequences are very long or infinite, it is necessary
                                                                                to truncate survivors to some manageable length 6. In other
           Initialization:
                                                                                words,thealgorithmmustcometo             a definitedecision o n
                         k = 0.;                                                nodes up to time k-6 at time k. Note that in Fig. 8(b) all
                                                                                time-4 survivors go through the same nodes up t o time 2. In
                   X(xo) = x o ;           i(m) arbitrary,        m # xo;
                                                                                general, if thetruncationdepth       6 ischosenlargeenough,
                   r(Xo) 0 ;
                       =                   r(m)    =        Q),   m # xo.       there is a high probability that all time-k survivors will go
                                                                                through the same nodes up to time k - 6 , so that the initial
           Recursion: Compute
                                                                                segment of the maximum-likelihood path is known up to time
                                                                                k-6 and can be put out as the algorithm's firm decision; in
                                                                                this case, truncation costs nothing.    In the rare cases when
        for all = ( x k + l ,   xk).                                            survivors disagree, any reasonable strategy for determining
            Find                                                                thealgorithm's   time-(k-6)decision       will work [20], [21]:
                                                                                choose an arbitrary time-(k-6) node, or the node associated
                                                                                with the shortest survivor, or node chosen by majority vote,
                                                                                                                  a
FORKEY: THE VITERBI ALGORITHM                                                                                                                            273

etc. If 6 is large enough, the effect on performance is negligi-
ble.
     Also, if k becomes large, it is necessary to renormalize the
lengths r ( m ) from time to time by subtracting                  a constant
from all of them.
                                                                                              Fig. 9. Typical cell of a shift-register process trellis.
      Finally, the algorithm may be required get started with-
                                                       to
out knowledge of the initial state xg. In this case it may be
initializedwithanyreasonableassignment                      of initialnode
lengths, such as r(m)=0, all m, or else r(m)= -In r,,,if t h e


                                                                                                                                           4
                                                                                                               II
states are known to have a priori probabilities r,,,.                  Usually,
after an initial transient, there is a high probability that all
survivors will merge with the correct path. Thus the algorithm
synchronizes itself without any special procedures.
      The complexity of the algorithm is easily estimated. First,
memory: the algorithm requires M storage locations, one for
each state, where each location must be capable of storing a
                                                                                           /   ,,                i(Uii,-.
                                                                                                                   II         1, &r  ,,             i(Cii,-.
                                                                                                                                                      ll




                                                                                           9+ 9
                                                                                                SELECT          SELECT            SELECT           SELECT
                                                                                                1-OF- 2         I-OF-2            I-OF-2           1-OF-2
“length” r ( m ) and a truncatedsurvivorlisting


                                                                                              -
                                                                    % ( m ) of 6
symbols.  Second,   computation: each
                                  in unit
algorithm must make IEI additions, one for each transition,
                                                                of timethe                  !                                               {;
                                                                                                                                            *
and M comparisons among the ] E l results. Thus the amount                                        - y -
                                                                                                   - -        - l
                                                                                                 TO
                                                                                                  ANOTHER LOGIC UNIT               TOANOTHER LOGIC UNIT
of storage is proportional to the number                of states, and the
amount of computation to the numberof transitions. ‘ith a                               Fig, 10. Basic VA logic unit for binary shift-register process.
shift-register process, M = my and 1 El = mV+l, that the com-
                                                           so
plexity increases exponentially with the length v of the shift Fourier transform (FFT). In fact, it is identical, except                                     for
register.                                                                           length, and indeed the FFT is also ordinarily organized cell-
      In the previous paragraph, have ignored the complexity wise. Li‘hile the add-and-compare computations of the
                                      we                                                                                                                     VA
involvedingeneratingtheincrementallengths                         X(&). In a are unlike those involved in the FFT, some of the memory-
shift-register process, it is typically true that              P ( x k + l I xk) is organization tricks developed for the FFT may be expected
either l/m or 0, depending on whether xk+l is an allowable suc- to be equally useful here.
 cessor t o xk or not; then all allowable transitions have the                            Because of its highly parallel structure and need for only
same value of -In P(xk+lI xk) and this component of X(&)                            add, compare, and select operations, the VA is well suited to
 may be ignored. Note that in more general cases                P(xk+l I xk) is high-speed applications. A convolutional decoder for a v = 6
 known in advance: hence this component can be precomputed                          code (Ji= 64, / E l = 128) t h a t is built out of 356 transistor-
 a n d ‘wired in.” The component -In P ( z k / 4;c) is the only com- transistor logiccircuitsandthatcanoperate                                    at u pt o 2
 ponentthatdepends         on the data; again, it            is typical that        Mbits/s perhaps represents the current state of the art [22].
 many f k lead to the same output              y k , andhencethevalue               Even a software decoder for a similar code can be run at a
 -In P ( z k 1 yk) need be computed or looked up for all these .$x. rate order           the              of 1000 bits/s   on      a minicomputer.        Such
 only once, givenz k . (Then the noise is Gaussian, -In P ( z k I y k ) moderate complexity qualifies the VA for inclusion in many
 is proportional simply to ( z k - y k ) * . ) Finally, all of this can be signal-processing systems.
 done outside the central recursion (“pipelined”). Hence the                              For further details of implementation, see [22], [23], a n d
 complexity of this computation tends not to be significant.                        the references therein.
      Furthermore, note that once the X&) are computed, the
 observation Zk need not be stored further, an attractive fea-                                          IV. ASALYSIS PERFORMAXE
                                                                                                                         OF
 ture in real-time applications.                                                         Just as important as thestraightforwardness of imple-
      A closer lookat t h e trellis of a shift-register process reveals mentation of t h e VA is the straightforwardness with which its
 additional detail that can be exploited in implementation. Forperformance can be analyzed. In many cases, tight upper and
 a binary shift register, the transitions in any unit time can lower bounds for error probability can be derived. Even when
                                                                  of
 be segregated into 2y-1 disjoint groups of four, each originat-                    t h e VA is not actually implemented, calculation               of its per-
 ing in a common pair of states and terminating in another                          formanceshowshowfartheperformance                       of lesscomplex
 common pair. A typical such cell is illustrated in Fig. 9, with schemes is from ideal, and often suggests simple suboptimum
 the time-k states labeled and x‘l and the time-(k+l) statesschemes that attain nearly optimal performance.
                              x’0
 labelled Ox‘ a n d lx’, where x’ stands fora sequence of v - 1 bits                      Thekeyconceptinperformanceanalysisisthat                         of a n
 that is constant within any one cell. For example, each time                       errorevent.Let         x be the actual state sequence, and             4 the
 unit in the trellis of Fig. 8 is made up of two such cells. Ye state sequence actually chosen by the                              VA. Over a long time x
 note that only quantities within the same interact in any and i will typically diverge and remerge a number of times,
                                                        cell
 onerecursion.Fig.       10 shows a basiclogic m i t that imple-                    as illustrated in Fig. 11. Each distinct separation is called a n
 mentsthecomputationwithinanyone                        cell. A high-speed errorevent.Erroreventsmayingeneralbe                                of unbounded
 parallel implementation of the algorithm can be built around length if x is infinite, but the probability of a n infinite error
 2-l such identical logic units;           a low-speed implementation,              event will usually be zero.
 aroundonesuchunit,time-shared;                    or a softwareversion,                  The importance of erroreventsisthattheyareprob-
 around the corresponding subroutine.                                               abilistically independent of one another; in the language of
      Many readers will have noticed that the trellis Fig. 7(b) probability theory they are
                                                                  of                                                   recurrent. Furthermore, they allow
 reminds them of the computational flow diagram of the fast us to calculate error probability per unit time, which neces-                               is
274                                                                                                     PROCEEDINGS OF THE IEEE,    u 1973
                                                                                                                                    r
                                                                                                                                    n

    k.0                                                         LrK

      - u w -
        W
  Fig. 11. Typical correct path X (heavy line) and estimated path     2
         (lighter line) in the trellis, showing three error events.


sary since usually the probability any error in MAP estima-
                                          of                                  Fig. 12. Typical correct path X (heavy line) and time-k incorrect
tion of a block of length K goes t o 1 as K goes to infinity.                                       subset for trellis of Fig. 7 b .
                                                                                                                                ()
Instead we calculate the probabilityof an error event starting
at some given time, given that the starting state is correct,
i.e., that an error event is not already in progress at that time.
     Given the correct path X , the set &k of all possible error                    ...                                                    . ..
events startingat some time k is a treelike trellis which starts
at x k and each of whose branches ends on the correct path, as
illustratedinFig.      12, for the trellis of Fig.7(b).Incoding                             Fig. 13. Trellis for continuous-phase FSK.
theory this is called the incorrect subset (at time ) .    R
     The probability of any particular error event is easily cal-
culated; it is simply the probability that the observations          will
be such that over the time span during which              4 is different
from X , f is more likely thanX . If the error event has length       T,
this is simplya two-hypothesis decision problem between two
sequences of length T, and typically has a standard solution.
                                                                                   Fig. 14. Typical incorrect subset. Heavy line: correct p t .
                                                                                                                                              ah
     The probability P ( & k ) t h a t any error event in occurs can
                                                          &k
                                                                                                   Lighter lines: incorrect subset.
then be upper-bounded, usually tightly, by             a union bound,
i.e., by the sum of the probabilities of all error events in &k.
While this sum may well be infinite, it is typically dominated a typical incorrect subset in Fig.                         14. The shortest possible
by one or a few large leading terms representing particularly             error event is of length 2 and consists of a decision t h a t t h e
likely error events, whose sum then forms a good approxima- signalwas{cos                         w(l)t, -cos o(1)t) rather than {cos w(O)t,
tion to P(&k).True upper bounds can often be obtained by                  cos w ( 0 ) t 1, or in our coordinate notation that(0, (0, - 1) ]
                                                                                                                                      { l),
flow-graph techniques [4], [SI, [%I.                                      is chosen over { (1, 0),(1, 0) 1. This is a two-hypothesis de-
     On the other hand,      a lowerbound to error-event prob-            cisionproblemin            a four-dimensionalsignalspace            [59]. I n
ability,againfrequentlytight,canbeobtainedbyagenie                        Gaussian noise of variance u* per dimension, only the Euclid-
argument. Take the particular error event that has the great- ean distance d between the two signals matters; in this case-
est probability of all those in &k. Suppose thata friendly genie d = 4 3 , and therefore the probability of this particular error
tells you that the true state sequence is one of two possibili- eventis Q(d/%)=Q(l/ad/Z), where Q(x) istheGaussian
ties: the actual correct path, the incorrect path correspond- error probability function defined by
                                    or
ing to that error event. Even with this side information, you
will still make an error      if the incorrect path ismorelikely
given z, so your probability of error is still no better than the
probability of this particular error event. In the absencethe     of
genie, your error probability must be worse still, since one of By examination of Fig. 14 we see that the error events of
the strategies you have, given the           genie’s information, is to   lengths 3, 4, . . lie at        distances 4 6 , dG, . from              the
ignore it. In summary, the probability of any particular error correct path; hence we arrive at upper and lower bounds on
event isa lower bound toP ( & k ) .                                       P ( & k ) of
     An important side observation is that this lower bound
                                                     VA.
applies to any decision scheme, not just the Simple exten-
                                                                                                                   +
                                                                          Q ( 4 2 / 2 u ) 2 P(&k) 5 Q ( d 2 / 2 ~ ) Q(d6/eu)
sions give lower bounds        to related quantities like bit prob-                                                 +                  +
                                                                                                                                Q(4i6/2u)       . ..*
ability of error. If the VA givesperformanceapproaching
these bounds, then it may be claimed that it is effectively opti-         I n view of the rapid decreaseof Q ( x ) with x , this implies that
mum with respect to these related quantities as well [3O].                P ( & k ) is accurately estimated as Q(d/2/2u) for any reasonable
     In conclusion, the probability of any error event starting noise variance a*. I t is easily verified from Fig.                       13 that this
at time k may be upper- and lower-bounded as follows:                     result is independentof the correct pathx or the timeK.
                                                                                This result is interesting in that the           best one can do with
max P(error event) 5 P(&k) 5 max P(error event)                           coherent detection in a single symbol interval is Q(1/2a) for
                                                     +    other terms. orthogonal signals. Thus exploiting the memory doubles the
                                                                          effective signal energy, or improves the signal-to-noise ratio
With luck these bounds will be close.                                     by 3 dB. It may therefore be claimed that continuous-phase
                                                                          FSK is inherently 3 dB better than noncontinuous for devia-
Example                                                                   tion ratio 3, or as good as antipodal phase-shift keying. (While
     For concreteness, and because the resultis instructive, we we have proved this only a deviation ratio of ), it holds for
                                                                                                            for
carry through the calculation for continuous-phase FSK                of nearly any deviation ratio.) Even though the is quite sim-    VA
theparticularlysimpletype              definedearlier.Thetwo-state        ple for a two-state trellis, the fact that only one type error       of
trellis for this process is shown in Fig. and the first part of event has any significant probability permits still simpler sub-
                                               13,
FORNEY: THE VITERBI ALGORITIIM                                                                                                                                 215

optimum schemes to achieve effectively the same performance               formance superior to all other coding schemes save sequential
[431,[441.’                                                               decoding,anddoesthis            at highspeeds,withmodestcom-
                                                                          plexity,andwithconsiderablerobustnessagainstvarying
                            V. APPLICATIONS                               channelparameters. A number of prototypesystemshave
    We conclude with a review of the results, both theoretical been implemented and tested [21]-[26], some quite original,
and practical, obtained with the VA to date.                              and it seems likely that Viterbi decoders become common
                                                                                                                             will
                                                                          in space communication systems.
A , ConvolutionalCodes                                                        Finally, Viterbi decoders have been             used as elements in
     It was for convolutional codes that the algorithm was first          very-high-performance      concatenated    coding schemes           [lo]-
developed, and naturally it has had its greatest impact here. [12], [27] andindecoding                           of convolutionalcodesinthe
    The principal theoretical result is contained in Viterbi’s            presence of intersymbol interference [35], [6O].
original paper [ l ] ;see also [SI-[7]. I t shows that for a suita-
bly defined ensemble of random trellis codes and MAP decod- B . IntersymbolInterference
ing, the error probability can be made to decrease exponen-                   Application of the VA to intersymbol interference prob-
tially with the constraint length a t all code rates R less t h a n lems i s more recent, and the main achievements have been
                                        v
channel capacity. Furthermore, the rate             of decrease is con-   theoretical. The principal result, for PAM in white Gaussian
siderably faster than for block codes with comparable decod- noise, is t h a t P ( & k ) can be tightly bounded as                follows:
ing complexity (although the same              as that for block codes
with the same decoding delay). In our view, the most telling                          K ~ Q ( d m i n / 2I & k ) I
                                                                                                          ~ )P (         KrQ(drnirJ2~)
comparison between block and convolutional codes is that an where K L and Kv are small constants, Q(x) is the Gaussian
effectively optimum block code of a n y specified length and              error probability function defined earlier, u? is t h e noise vari-
rate can be created by suitably terminating a convolutional ance, and dmin is the minimum Euclidean distance between
(trellis)code, but with a lowerratefortheblockcode                     of any two distinct signals[29]. This result implies that on most
course [7]. In fact, if convolutional codes were any better,
                                                                          channels intersymbol interference need not lead to any                sig-
they could be terminated to yield better block codes than are nificant degradation in performance, which comes as rather a
theoretically possible-an observation which shows that the
                                                                          surprise.
bounds on convolutional code performance must be tight.
                                                                               Forexample,withthemostcommonpartialresponse
     For fixed binaryconvolutionalcodes              of nonasymptotic systems, the VA recovers the 3-dB loss sustained by conven-
length on symmetric        memoryless    channels, principal
                                                   the
                                                                          tional detectors relative to full-response systems            [29],[32].
result [6] is that P ( & k ) is approximately given by
                                                                          Simplesuboptimumprocessors                [29],[33] candonearly as
                            P(&k)      ;r,2-dD                           well.
                                                                               Several workers [34],[35],[42]            have proposed adaptive
where d is the free distance, Le., the minimum Hamming dis- versions of the                 algorithm unknown time-varying
                                                                                                         for           and
tance of any path in the incorrect subset &h from the correct channels.             Ungerboeck           [41],
                                                                                                             [42]         and
                                                                                                                            hlackechnie         [35]
path: N d is the number of such paths; and D is the Bhat-                 have shown that only matched filter rather thana whitened
                                                                                                     a
tacharyya distance                                                        matched filter is needed in PAM.
               D = logs           P ( z O)l’*P(z 1)l I 2   1                       I
                                                                               I t seems most likely that the greatest effect of the VA on
                                                                          digital modulation systems         will be t o reveal those instances in
                                Z
                                                                          which  conventional detection  techniques significantly
                                                                                                                     fall
                                         z in
where the sum is over all outputs the channel output spaceshort of optimum, to                   and suggest     effective suboptimum
2.
                                                                           methods of closing the gap. PAM channels that cannot be
     On Gaussian channels                                                 linearly equalized without excessive noise enhancement due
                                                                          t o nulls or near nulls in the transmission band are the likeliest
                                                                          candidates for nonlinear techniques of this kind.
where & / N O is the signal-to-noise ratio per information bit.
The tightness of this bound is confirmed by simulations [8], C. Text Recognition
 [9], [22]. For a v = 6 , d = 10, R=+ code, for example, error                 An experiment was run in which digram statistics were
probabilities o l e * , lC5,a n d le7 achieved at Eb/No used t o correct garbled text produced by a simulated noisy
                f                               are
 =3.0, 4.3, a n d 5.5 dB,respectively,which is within a few characterrecognizer [47]. Resultsweresimilartothose                                   of
tenths of a dceibel of this bound [22].                                    Raviv [48] (using digram statistics), although the algorithm
     Channels available space
                         for      communications fre-are                  was simpler and only had hard decisions rather than                 confi-
quently accurately modeled as white Gaussian channels. The dence levels t o work with. It appears that the algorithm may
VA attractive such
    is         for     channels   because gives
                                            it     per-                   be a useful adjunct t o sophisticated character-recognition sys-
                                                                          tems resolving
                                                                                for        ambiguities    when   confidence  levels
                                                                                                                                  for
   * De Buda [44] actually proves thatanoptimum decision onthe            different characters are available.
phase x t at time k can be made by examining the received waveform only
at times k- 1 and k ; Le., ( z o . ~ - I z ~ t - l ) (ZU, w ) . The proof is that the log
                                         ,           ,                                                                     VI. CONCLUSIOK
likelihood ratio
                      P ( Z 0 . k - 1 , Z1.k-I,   ZOk, Zlk        I   Xk-1,   Xk       0,               The VA has already hadsignificant impact on our under-
                                                                                                                                 a
                -In                                                                                 standing of certain problems, notably in the theories convo-
                                                                                                                                                           of
                      P ( Z 0 . t - 1 , Z1.t-1,   Z O t , &t-1,               Xk   =   *, Y t + d   lutional codes and of intersymbol interference. I t is beginning
is proportional to - z ~ , t - l + z ~ . ~ - 1 - z u - -for any values of the pair of
                                                         ~~
states ( ~ l t - 1 ~
                   xk+1). For this phase decision (which differs slightly from our
                                                                                                    to have a substantial practical impact as well in the engineer-
sequence decision) the error probability is exactly Q(&/2n).                                        ing of space-communication links. The amount work it has
                                                                                                                                                     of
276                                                                                                                      PROCEEDINGS OF THE IEEE, MARCH                                 1973

inspired in the intersymbol interference area suggests that                      Now we note the recursive formula
here too practical applications are not far off. The generality
of the model t o which it applies and the straightforwardness                           p(%,zOL-')         =           p ( z k ,k - 1 ,
                                                                                                                             x                   ZO"')
                                                                                                               Zk-1
with which it can be analyzed and implemented lead one to
believe that in both theory and practice it    will find increasing                                        =           p ( z k - 1 ,O " ' ) p ( z k ,- 1
                                                                                                                                 Z                4                      1   zk-1)
                                                                                                               zk-1
application in the years ahead.
                                                                                                                 M
                                                                                 which allows us to calculate the quantities P(%, ~0"') from
                                APPENDIX                                         the M quantities P ( x ~ 1zo"*) with IZI multiplications and
                                                                                                            ,
                        RELATED ALGORITHMS                                       additions using the exponentiated lengths
      In this Appendix we mention some processing structures
that are closely related t o t h eVA, chiefly sequential decoding
                                                                                                        = p ( z k %k-l)P(zk-l  I             &-I).           I
and minimum-bit-error-probability algorithms. We also men- Similarly we have the backward recursion
tion extensions of the algorithm to generate reliability infor-
mation, erasures, and lists.                                                     p(Zk"       xk) = 1          p(zkK,x k + l Z k )                I
                                                                                                        ZW1
      When the trellis becomes large, it is natural to abandon
the exhaustive search of the VA in favorof a sequential trial-                                     =          p ( z kx k + l
                                                                                                                     ,          xk)p(zk+l"   I      Zk+d             I
                                                                                                        Zkil
and-error search that selectively examines only those paths
likely t o be the shortest. In the coding literature, such            al- which has a similar complexity. Completion of these forward
gorithms are collectively known as sequential decoding [Is]- and backward recursions for all nodes allowsP ( x k , Z) and/or
 [l6]. The simplest to explain is the 'stack" algorithm             [IS], P(&,z) t o be calculated for all nodes.
 [16], which a list is maintainedof the shortest partial paths
        in                                                                    Now, to be specific, let us consider a shift-register process
found to date, the path on the top the list is extended, and and let S(uk) be the setof all states zel whose first component
                                               of
its successors reordered in the list until some path is found             is %k. Then
t h a t reachesthe &eLminalnode, or elsedecreaseswithout
                                                                                              p ( u k , 2) =                   p ( x k + l , 2)-
limit. (That some pathwill eventually do so is ensured in cod-
                                                                                                                 Zk+lES(Ut)
ing applications by the subtraction of a bias term such that
the length of the correct path tends to decrease while thatof Since P(uk, z) =P(ukI z ) P ( z ) , MAP estimation of u k reduces
all incorrect paths tends to increase.) Searches that start from o findingthemaximum
                                                                          t                                       of thisquantity.Similarly,               if we
either end of a finite trellis [I71 are alsouseful.                       wish t o find the MAP estimate of an output y k , say, then let
      In coding applications, sequential decoding has many             of S(p) be the setof all & that leadto yk and compute
the same properties as Viterbi decoding, including the same
error probability. I t allows the decoding of longer and there-
                                                                                                P(Yk, 4 =                      p ( f k , 2).
                                                                                                                    €t= k t )
fore more powerful codes, at the cost of a variable amount of
computationnecessitatingbufferstoragefortheincoming                       A similar procedure can               be used to estimate any quantity
d a t a z. It is probably less useful outside of coding, since i t        which isa deterministic functionof states or transitions.
depends on the decoder's ability to              recognize when the best      Besidesrequiringmultiplications,thisalgorithmisless
pathhasbeenfoundwithoutexaminingotherpaths,and                            attractive than the VA in requiring a backward as well as a
therefore requires eithera finite trellisor a very large distance forward recursion and consequently storage                                   of all data. The
between the correct path and            possible error events.            following amended algorithm [39], [48] eliminates the latter
      Intheintersymbolinterferenceliterature,many                 of the ugly feature at the cost of suboptimal performance and addi-
                         find                                                                                us
early attempts to optimum nonlinear algorithms used bit- tional computation. Let restrict ourselves to shift-register                             a
errorprobability as theoptimalitycriterion.TheMarkov                      process with input sequence u and agree to use only observa-
property of the process leads to algorithms that are manage- tions up to time                   k+6 in estimating U k , say, where 6 2v- 1. W e
able but less attractive than the Viterbi [36]-[42].                      then have
      The general principle of several of these algorithms is as                  P ( u k , 20")       =            ' * *          p(Ukk*,       ZOw").
follows.8 First, we calculate the joint probability P ( x k , z) for                                      uti1               Uti6
every state x k in the trellis, or alternately P(&,z) for every
transition &. This is done by observing that                              Thequantities                in thesumcan be determinedrecursively
                                                                          by
                                             I
             p ( z k , 2) = p ( x k , ZO"')p(ZkK     x k , 20"')
                                                                                                           =
                       =   p(xk,             I
                                   ZOk1)p(zkK xk)
                                                                                     p ( U k k M , ZOk*)
                                                                                                                UL-1
                                                                                                                         p(Uk-lk",               ZOkM)


since, given Xk, the outputs zkK from time k t o K are indepen-                                            =            p(uk-lkul,                   ZOku'       1
d e n t of the outputs zoL-' from time 0 t o K - 1 . Similarly                                                  ut-1

                                                                                                                 p(uk+d,           &+d       I   W k Y l , '     *           uk+6-r).

                                                                                 While the recursion is now forward only, we now must store
                                                                                 m' quantities rather than m'. If I is large, this is most un-
                                                                                   *
                                                                                 attractive; if 6 is close t o v-1, the estimate may well be de-
                                                                                 cidedly suboptimum.
                                                                                       These variations thus seem considerably less attractive
   'The author is indebted to Bahl et al. 1181 foraparticularly         lucid
                                                                                 t h a n t h e VA. Nonetheless, something like this may need to
exposition of this type of algorithm.                                            be hybridized with the VA in certain situations suchtrack-as
FORNEY: THE VITERBI ALGORITHM                                                                                                                                   277

ing a finite-state source overa finite-state channel, where only       space communication,” I E E E Trans. Commun. Technol., vol. COM-
                                                                       19, pp. 835-847, Oct. 1971.
the state sequence the source is interest.
                     of             of                            [23] G. C. Clark, Jr., and R. C. Davis, “Two recent applications error-
                                                                                                                                    of
      Finally, we can consider augmented outputs from the     VA.      correction coding to communications system design,” I E E E Trans.
A good general indication of how well the algorithm is doing [24] Commun. Technol., vol. COM-19, pp. 856-863,TDMA satellite com-
                                                                       I. M. Jacobs and R. J. Sims, “Configuring a
                                                                                                                        Oct. 1971.
is the depth which all paths are merged; this can be to
               at                                          used        munication system with coding, in Proc. 5th Hawaii Int. Conf. Sys-
establish whether or not a communications channel is on the            tems
                                                                          Science.      Honolulu, Hawaii:WesternPeriodicals,       1972, pp.
                                                                       443-446.
air, in synchronism, etc. l l ] ,[28]. A more selective indicator 1251 A. R. Cohen, J. A. Heller, andA. J. Viterbi, “A new coding technique
                           [
of howreliableparticularsegmentsare          is the difference in      forasynchronousmultiple       access communication,” I E E E Trans.
lengths between the best and the next-best paths the point
                                                      at               Commun. Technol., vol. COM-19, pp. 849-855, Oct. 1971.
                                                                  (261 D. Quagliato,“Errorcorrecting       codes applied to satellitechan-
of merging; this reliability indicator can be quantized into an        nels,= in 1972 I E E E Int. Conf. Communications (Philadelphia, Pa.),
erasure output. Lastly, the algorithm can be altered to store          pp. 15/13-18.
t h e L best paths, rather than the single best path, as the sur- [27] Linkabit NAS2-6722,coding Ames Res. Cen., Rep. Field, Calif.,
                                                                       Contract
                                                                                Corp.,“Hybrid
                                                                                               NASA
                                                                                                     systemstudy,”
                                                                                                                 Final
                                                                                                                       Moffett
                                                                                                                                          on
vivors in each recursion, thus eventually generating a list of         NASA Rep. CR114486, Sept. 1972.
t h e L most likely path sequences.                               [281 G. C. Clark, Jr., and R. C. Davis, “Reliability-of-decodingindicators
                                                                                         for maximum likelihood decoders,” in Proc. 5th Hawaii I n f . Conf.
                                                                                         SystemsScience. Honolulu, Hawaii: Western Periodicals, 1927, pp.
                               REFERENCES                                                447-450.

Conuolutwnal Codes                                                                  Intersymbol Interference
                                                                                    (291 G. D.Forney,Jr.,“Maximum-likelihoodsequenceestimation                    of
 [l] A. J. Viterbi, “Error bounds for convolutional codes and an asymp-                  digital sequences in the  presence of intersynlbol interference,” E E E
                                                                                                                                                              I
        toticallyoptimumdecoding  algorithm,”             I E E E Trans.
                                                                       Inform.           Trans. Inform. Theory, vol. IT-18, pp. 363-378, May 1972.
        Theory, vol. IT-13, pp. 260-269, Apr. 1967.                                 [30] -,     “Lower bounds on error probability in        the presence of large
  121 G. D.Forney,Jr.,“Convolutional            codes I: Algebraic structure,”           intersymbol  interference,” I E E E Trans.Commun. Technol. (Cor-
        I E E E Trans. Inform. Theory, vol. IT-16, pp. 720-738, Nov. 1970.               resp.), vol. COM-IO, pp. 76-77, Feb. 1972.
  [3] -  ,      “Review of randomtreecodes,”           NASA Ames Res. Cen.,         [31] J. K. Omura, optimum
                                                                                                       ”On                    receivers forchannels inter-
                                                                                                                                                    with
        Moffett Field, Calif., Contract NAS2-3637, NASA CR 73176, Final                  symbol interference,” abstract presented at the IEEE Int. Symp.
        Rep., Dec. 1967, appendix A.                                                     Information Theory, Noordwijk, The Netherlands, June 1970.
  [4] G. C. Clark, R. C. Davis,J. C. Herndon, and D. D. McRae, “Interim             [32] H. Koba;ashi, ”Correlative level codingand maximum-likelihood
        report on convolution coding research,” Advanced System Opera-                   decoding, I E E E Trans.Inform.Theory,          vol. IT-17, PP. 586-594,
        tion, Radiation Inc., Melbourne, Fla., Memo Rep.38, Sept. 1969.                  Sept. 1971.
  (5) A. J. Viterbi and J. P. Odenwalder, “Further results on optimal de-           (331 M. J. Ferguson, ”Optimal receptionfor binary partial response chan-
        coding of convolutional codes,” I E E E Trans. Inform. Theory (Cor-              nels,” Bell Syst. Tech. J . , vol. 51, pp. 493-505, Feb. 1972.
        resp.), vol. IT-15, pp. 732-734, Nov. 1969.                                 (341 F. R. Magee, Jr., and J. G. Proakis, “Adaptive maximum-likelihood
  [a] A. J. Viterbi, “Convolutional codes and their performance in com-                  sequence estimationfor digital signaling in thepresence of intersym-
        municationsystems,” I E E E Trans. Commun. Technol., COM-19,vol.                 bo1 interference,” I E E E Trans.     Inform.Theory       (Corresp.), vol.
        pp. 751-772, Oct. 1971.                                                          IT-19, pp. 120-124, Jan. 1973.
  [7] G. D.Forney,Jr.,“Convolutional            codes 11: Maximum likelihood        (351 L. K. Mackechnie, ”Maximum likelihood receivers for channels hav-
        decoding,’! Stanford Electronics Labs., Stanford, Calif., Tech. Rep.             ing memory,’ Ph.D. dissertation, Dep. Elec. Eng., Univ. of Notre
        7004-1, June 1972.                                                               Dame, Notre Dame, Ind., Jan. 1973.
  181 J. A. Heller, “Short constraint length convolutional codes,” in      Space                             J.
                                                                                    [36] R. W. Chang and C. Hancock, “Onreceiver structures for channels
        Program Summary 37-54, vol. 111. Jet Propulsion Lab., Calif. Inst.               having memory,’ I E E E Trans. Inform. Theory, vol. IT-12, pp. 463-
        Technol., pp. 171-177, 0ct.-Nov. 1968.                                           468, Oct. 1966.
  (91 -  ,      “Improved performance of short constraint length convolu-           [37] K.Abend,T.       J. Harley,Jr., B. D.Fritchman,and C. Gumacos,
        tional codes,” in Space Program Summary37-56,vol. 111. Jet Propul-                “On optimum receivers for channels having memory,”I E E E Trans.
        sion Lab., Calif. Inst. Technol., pp. 83-84, Feb.-Mar. 1969.                     Inform. Theory (Corresp.), vol. IT-14, pp. 819-820, Nov. 1968.
[lo] J. P. Odenwalder, “Optimal decoding of convolutional codes,” Ph.D.             [38] R. R. ?wen, “Bayesian decision procedures for interfering digital
        dissertation,Dep.Syst. Sci., Sch.Eng. Appl. Sci., Univ. of Cali-                 signals, I E E E Trans.  Inform.   Theory     (Corresp.), vol. IT-15, pp.
        fornia, Los Angeles, 1970.                                                       506-507, July 1969.
[Ill J. L. Ramsey,“Cascadedtreecodes,”                M I T Res. Lab.Electron.,     [39] K. Abend and B. D. Fritchman, “Statistical detection for communi-
        Cambridge, Mass., Tech. Rep. 478, Sept. 1970.                                    cationchannelswithintersymbolinterference,”             Proc. I E E E , vol.
[12] G. W. Zeoli, “Coupled decodingof block-convolutional concatenated                   58, pp. 779-785, May 1970.
        codes,”Ph.D.dissertation,Dep.         Elec. Eng.,Univ. of California,       (401 C. G. Hilborn, Jr., “Applications of unsupervised learning to prob-
        Los Angeles, 1971.                                                               lems of digital communication,” in Proc. 9th I E E E Symp. Adaptive
[I31 J. M. Wozencraft,“Sequential decoding for reliable communica-                        Processes, Decision, and Control (Dec. 7-9, 1970).
        tion,” in 1957 I R E Nat. Conv. Rec., vol. 5, pt. 2, pp. 11-25.                   C. G. Hilborn, Jr., andD.G.Lainiotis,“Optimalunsupervised
[I41 R. M. Fano,“A heuristic discussion of probabilistic decoding,”I E E E               learning multicategory dependent hypotheses pattern recognition,”
        Trans. Inform. Theory,vol. IT-9, pp. 64-74, Apr. 1963.                           I E E E Trans. Inform. Theory, vol. IT-14, pp. 468-470, May 1968.
1151 K. S. Zigangirov, “Some sequentialdecodingprocedures,”                Probl.   [41] G. Ungerboeck, “Nonlinear equalization of binary signals in Gaus-
        Pered. Inform., vol. 2, pp. 13-25, 1966.                                         sian noise,”I E E E Trans. Commun. Technol.,      vol. COM-19, pp. 1128-
[161 F. Jelinek, “Fast sequential decoding algorithmusing a stack,” I B M                 1137, Dec. 1971.
        J . Res. Develop., vol. 13, pp. 675-685, Nov. 1969.                         [42] --,    ”Adaptive maximum-likelihood receiver for carrier-modulated
[171 L. R. Bahl, C. D. Cullum, W. D. Frazer, and F. Jelinek, ”An effi-                   data transmission systems,” in preparation.
        cient algorithm for computing free distance,” IEEE Trans. Inform.
        Theory (Corresp.), vol. IT-18, pp. 437-439, May 1972.                       Continuous-Phase FSK
[ l a ] L. R. Bahl, J. Cocke,F. Jelinek, and J. Raviv, “Optimal decodingof          [43] M. G. Pelchat, R. C. Davis, and M . B. Luntz, “Coherent demodula-
        linear codes for minimizing symbol error rate’ (Abstract), in 1972               tion of continuous-phase binary FSK signals,” inProc. Inl. Telemetry
        In#. Symp. InformationTheory (Pacific Grove, Calif., Jan.1972),                  Conf. (Washington, D. C., 1971).
        p. 90.                                                                      [44] R. de Buda, “Coherent demodulation of frequency-shift keying with
1191 P. L. McAdam, L. R. Welch, and C. L. Weber, “M.A.P. bit decoding                    low deviation ratio,”I E E E Trans. Commun. Technol.,vol. COM-20,
        of convolutional  codes,” in     1972 I n f . Symp. Informafion Theory           pp. 429-435, June 1972.
        (Pacific Grove, Calif., Jan. 1972), p. 91.
                                                                                    Text Recognition
                                                                                    [45] C. E. Shannon and W. Weaver, The MUthemdica! Theory of      Com-
Space Applications                                                                       munication. Urbana, Ill.: Univ. of Ill.Press, 1949.
[20] Linkabit Corp., “Coding systems study for high data rate telemetry             [46] C. E. Shannon, “Prediction and entropy of printed English,” Bell
     links, ’Final Rep. on N4SA Ames Res. Cen., Moffett Field, Calif.,                   Syst. Tech. J . , vol. 30, pp. 50-64, Jan. 1951.
     Contract NASZ-6024, Rep. CR-114278, 1970.                                      (471 D. L. Neuhoff, “The Viterbi algorithm as aid in textrecognition,’
                                                                                                                                      an
[21] G. C. Clark, “Implementation of maximum likelihood decoders for                     Stanford Electronic Labs., Stanford, Calif., unpublished.
     convolutional codes,” in Proc. Int. Telcmetering Conf. (Washington,            (481 J. Raviv, “Decision making in Markov chains applied to problem
                                                                                                                                                   the
     D. C., 1971).                                                                       of pattern recognition,” I E E E Trans.Inform.Theory, vol. IT-13,
[22] J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite and                  pp. 536-551, Oct. 1967.
218                                                                                                        PROCEEDINGS OF THE IEEE, MARCH        1973

Miscellaneous                                                                [54] S. W.    Golomb,       Shift Regisln Sequcrrccs. San Franasm, Calif.:
                                                                                     Holden-Day, 1967, pp..13-17.
[49] H.Kobayashi,“Asurvey        of codingschemesfortransmissionor           [ 5 5 ] G. J. Minty, ‘ comment on the shortest-route problem,” O p a .
                                                                                                       A
     recording o digital data,” IEEETrans.Commun.Technol.,
                 f                                                    vol.           Res., vol. 5 , p. 724, Oct. 1957.
     COM-19, pp. 1087-1100, Dec. 1971.                                       [56] M. Pollack andW. Wiebenson,“Solutions of the shortest-route
[SO] -,    “Application of probabilistic decoding t o digital magnetic re-           problem-A review,” O p a . Res., vol. 8, pp. 224-230, Mar. 1960.
     cordingsystems,” I B M J . Res. Develop., vol. 15, pp. 64-74, Jan.
     1971.                                                                   [57] R. Busacker and T. Saaty, Finite Graphs and Networks: A n Z n t r o -
                                                                                     duction with Applications. New York: McGraw-Hill, 1965.
[51] U. Tirnor, ‘Sequential ranging with the Viterbi algorithm,” Jet  Pro-
     pulsion Lab., Pasadena, Calif., JPL Tech. Rep. 32-1526, vol. 11, pp.    [58] J. K. Omura,         “On the Viterbi decoding algorithm,” ZEEE Trans.
     75-79, Jan. 1971.                                                               Inform. T h m y , vol. IT-15, pp. 177-179, Jan. 1969.
[52] J.K.Omura,      “On the Viterbi algorithm forsourcecoding”(Ab-          [59] J. M. Wozencraft and I. M. Jacobs, Principres of Cmnunicaliorr
     stract), in1972 ZEEE Znt. Symp. Information Theory (Pacific Grove,              Engineering. New York: Wiley, 1965, ch. 4.
     Calif., Jan. 1972), p. 21.                                              [a] K. Omura, ‘Optimal receiver design for convolutional codes and
                                                                                     J.
[53] F. P. Preparata and S. R. Ray, “An approach to artificial nonsym-               channelswithmemory via control theoreticalconcepts,” Inform.
     bolic cognition,”Inform. Sci.,vol. 4, pp. 65-86, Jan. 1972.                     Sci., vol. 3, pp. 24S-266, July 1971.

Mais conteúdo relacionado

Mais procurados

On elements of deterministic chaos and cross links in non- linear dynamical s...
On elements of deterministic chaos and cross links in non- linear dynamical s...On elements of deterministic chaos and cross links in non- linear dynamical s...
On elements of deterministic chaos and cross links in non- linear dynamical s...iosrjce
 
inflacion
inflacioninflacion
inflacionunover
 
Further discriminatory signature of inflation
Further discriminatory signature of inflationFurther discriminatory signature of inflation
Further discriminatory signature of inflationLaila A
 
Feedback of zonal flows on Rossby-wave turbulence driven by small scale inst...
Feedback of zonal flows on  Rossby-wave turbulence driven by small scale inst...Feedback of zonal flows on  Rossby-wave turbulence driven by small scale inst...
Feedback of zonal flows on Rossby-wave turbulence driven by small scale inst...Colm Connaughton
 
Chris Clarkson - Testing the Copernican Principle
Chris Clarkson - Testing the Copernican PrincipleChris Clarkson - Testing the Copernican Principle
Chris Clarkson - Testing the Copernican PrincipleCosmoAIMS Bassett
 
Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)
Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)
Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)W. A. Zúñiga-Galindo
 
Introduction to (weak) wave turbulence
Introduction to (weak) wave turbulenceIntroduction to (weak) wave turbulence
Introduction to (weak) wave turbulenceColm Connaughton
 
Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010
Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010
Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010Colm Connaughton
 
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...Rene Kotze
 
Large scale coherent structures and turbulence in quasi-2D hydrodynamic models
Large scale coherent structures and turbulence in quasi-2D hydrodynamic modelsLarge scale coherent structures and turbulence in quasi-2D hydrodynamic models
Large scale coherent structures and turbulence in quasi-2D hydrodynamic modelsColm Connaughton
 
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...Vladimir Kulyukin
 
Random 3-manifolds
Random 3-manifoldsRandom 3-manifolds
Random 3-manifoldsIgor Rivin
 
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...Colm Connaughton
 
Resolving the black-hole information paradox
Resolving the black-hole information paradoxResolving the black-hole information paradox
Resolving the black-hole information paradoxFausto Intilla
 

Mais procurados (20)

Universe from nothing
Universe from nothingUniverse from nothing
Universe from nothing
 
Função de mão única
Função de mão únicaFunção de mão única
Função de mão única
 
On elements of deterministic chaos and cross links in non- linear dynamical s...
On elements of deterministic chaos and cross links in non- linear dynamical s...On elements of deterministic chaos and cross links in non- linear dynamical s...
On elements of deterministic chaos and cross links in non- linear dynamical s...
 
inflacion
inflacioninflacion
inflacion
 
Further discriminatory signature of inflation
Further discriminatory signature of inflationFurther discriminatory signature of inflation
Further discriminatory signature of inflation
 
Feedback of zonal flows on Rossby-wave turbulence driven by small scale inst...
Feedback of zonal flows on  Rossby-wave turbulence driven by small scale inst...Feedback of zonal flows on  Rossby-wave turbulence driven by small scale inst...
Feedback of zonal flows on Rossby-wave turbulence driven by small scale inst...
 
Chris Clarkson - Testing the Copernican Principle
Chris Clarkson - Testing the Copernican PrincipleChris Clarkson - Testing the Copernican Principle
Chris Clarkson - Testing the Copernican Principle
 
Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)
Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)
Topological String Amplitudes, by W. A. Zúñiga-Galindo (CINVESTAV)
 
Introduction to (weak) wave turbulence
Introduction to (weak) wave turbulenceIntroduction to (weak) wave turbulence
Introduction to (weak) wave turbulence
 
Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010
Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010
Weak Isotropic three-wave turbulence, Fondation des Treilles, July 16 2010
 
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
NITheP WITS node seminar: Prof Jacob Sonnenschein (Tel Aviv University) TITLE...
 
Large scale coherent structures and turbulence in quasi-2D hydrodynamic models
Large scale coherent structures and turbulence in quasi-2D hydrodynamic modelsLarge scale coherent structures and turbulence in quasi-2D hydrodynamic models
Large scale coherent structures and turbulence in quasi-2D hydrodynamic models
 
10.1.1.131.4735
10.1.1.131.473510.1.1.131.4735
10.1.1.131.4735
 
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
 
1416336962.pdf
1416336962.pdf1416336962.pdf
1416336962.pdf
 
SoCG14
SoCG14SoCG14
SoCG14
 
Random 3-manifolds
Random 3-manifoldsRandom 3-manifolds
Random 3-manifolds
 
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
 
Sm421 rg
Sm421 rgSm421 rg
Sm421 rg
 
Resolving the black-hole information paradox
Resolving the black-hole information paradoxResolving the black-hole information paradox
Resolving the black-hole information paradox
 

Destaque

Introduction to bcr associates
Introduction to bcr associatesIntroduction to bcr associates
Introduction to bcr associatesSarah Revell
 
Presentation1
Presentation1Presentation1
Presentation1jess2903
 
Roby14 dovezi pentru hagen.pptx
Roby14 dovezi pentru hagen.pptxRoby14 dovezi pentru hagen.pptx
Roby14 dovezi pentru hagen.pptxSzekeli Marian
 
Calendario avvento
Calendario avventoCalendario avvento
Calendario avventolquario
 
Analyst briefing revised 11 4 11 (new demo link)
Analyst briefing revised 11 4 11 (new demo link)Analyst briefing revised 11 4 11 (new demo link)
Analyst briefing revised 11 4 11 (new demo link)allang1985
 
Discover A New Arthritis Treatment
Discover A New Arthritis TreatmentDiscover A New Arthritis Treatment
Discover A New Arthritis TreatmentDamian Kulczynski
 
Steve jobs finally 1234
Steve jobs finally  1234Steve jobs finally  1234
Steve jobs finally 1234Roygf
 
Rashomon essay
Rashomon essayRashomon essay
Rashomon essaysarahmcano
 
Biometrics presentation
Biometrics presentationBiometrics presentation
Biometrics presentationSn Moddho
 
Дачная стрелка 3
Дачная стрелка 3Дачная стрелка 3
Дачная стрелка 3Al Maks
 
Presentasi sd on campus hokudai after
Presentasi sd on campus hokudai   afterPresentasi sd on campus hokudai   after
Presentasi sd on campus hokudai afterNinda Rahmawati
 
Creating a Global Competitive Intelligence Community with non-CI personel
Creating a Global Competitive Intelligence Community with non-CI personelCreating a Global Competitive Intelligence Community with non-CI personel
Creating a Global Competitive Intelligence Community with non-CI personelAlexandra Nelles
 

Destaque (20)

Anastasia
AnastasiaAnastasia
Anastasia
 
Instrukcija ridgid-seektech-sr-20
Instrukcija ridgid-seektech-sr-20Instrukcija ridgid-seektech-sr-20
Instrukcija ridgid-seektech-sr-20
 
New york
New yorkNew york
New york
 
Introduction to bcr associates
Introduction to bcr associatesIntroduction to bcr associates
Introduction to bcr associates
 
Presentation1
Presentation1Presentation1
Presentation1
 
Roby14 dovezi pentru hagen.pptx
Roby14 dovezi pentru hagen.pptxRoby14 dovezi pentru hagen.pptx
Roby14 dovezi pentru hagen.pptx
 
Calendario avvento
Calendario avventoCalendario avvento
Calendario avvento
 
Analyst briefing revised 11 4 11 (new demo link)
Analyst briefing revised 11 4 11 (new demo link)Analyst briefing revised 11 4 11 (new demo link)
Analyst briefing revised 11 4 11 (new demo link)
 
E! Network
E! Network E! Network
E! Network
 
Discover A New Arthritis Treatment
Discover A New Arthritis TreatmentDiscover A New Arthritis Treatment
Discover A New Arthritis Treatment
 
Steve jobs finally 1234
Steve jobs finally  1234Steve jobs finally  1234
Steve jobs finally 1234
 
Presentation media
Presentation media Presentation media
Presentation media
 
Rashomon essay
Rashomon essayRashomon essay
Rashomon essay
 
Biometrics presentation
Biometrics presentationBiometrics presentation
Biometrics presentation
 
Jadwal UAS 2014.1
Jadwal UAS 2014.1Jadwal UAS 2014.1
Jadwal UAS 2014.1
 
Дачная стрелка 3
Дачная стрелка 3Дачная стрелка 3
Дачная стрелка 3
 
Ptk
PtkPtk
Ptk
 
Presentasi sd on campus hokudai after
Presentasi sd on campus hokudai   afterPresentasi sd on campus hokudai   after
Presentasi sd on campus hokudai after
 
Creating a Global Competitive Intelligence Community with non-CI personel
Creating a Global Competitive Intelligence Community with non-CI personelCreating a Global Competitive Intelligence Community with non-CI personel
Creating a Global Competitive Intelligence Community with non-CI personel
 
19. i cvtpf
19. i cvtpf19. i cvtpf
19. i cvtpf
 

Semelhante a Viterbi2

Reflection On Quantum Leadership
Reflection On Quantum LeadershipReflection On Quantum Leadership
Reflection On Quantum LeadershipRebecca Buono
 
The korteweg dvries equation a survey of results
The korteweg dvries equation a survey of resultsThe korteweg dvries equation a survey of results
The korteweg dvries equation a survey of resultsHaziqah Hussin
 
Superstring Theories
Superstring TheoriesSuperstring Theories
Superstring TheoriesLeslie Lee
 
The Theory Of The Quantum Computation Model
The Theory Of The Quantum Computation ModelThe Theory Of The Quantum Computation Model
The Theory Of The Quantum Computation ModelAshley Davis
 
Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)
Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)
Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)SEENET-MTP
 
Commutative algebra
Commutative algebraCommutative algebra
Commutative algebraSpringer
 
Parellelism in spectral methods
Parellelism in spectral methodsParellelism in spectral methods
Parellelism in spectral methodsRamona Corman
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...IRJET Journal
 
Be2419772016
Be2419772016Be2419772016
Be2419772016IJMER
 
The Founding Principles Of The Fourier Series
The Founding Principles Of The Fourier SeriesThe Founding Principles Of The Fourier Series
The Founding Principles Of The Fourier SeriesNicolle Dammann
 
MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhangcanlin zhang
 
A taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsA taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsunyil96
 
G. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic Inflation
G. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic InflationG. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic Inflation
G. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic InflationSEENET-MTP
 
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...SEENET-MTP
 
The Lotka Volterra Predator Prey Model
The Lotka Volterra Predator Prey ModelThe Lotka Volterra Predator Prey Model
The Lotka Volterra Predator Prey ModelSugar Murillo
 

Semelhante a Viterbi2 (20)

Reflection On Quantum Leadership
Reflection On Quantum LeadershipReflection On Quantum Leadership
Reflection On Quantum Leadership
 
The korteweg dvries equation a survey of results
The korteweg dvries equation a survey of resultsThe korteweg dvries equation a survey of results
The korteweg dvries equation a survey of results
 
PART X.1 - Superstring Theory
PART X.1 - Superstring TheoryPART X.1 - Superstring Theory
PART X.1 - Superstring Theory
 
Superstring Theories
Superstring TheoriesSuperstring Theories
Superstring Theories
 
3. AJMS _461_23.pdf
3. AJMS _461_23.pdf3. AJMS _461_23.pdf
3. AJMS _461_23.pdf
 
The Theory Of The Quantum Computation Model
The Theory Of The Quantum Computation ModelThe Theory Of The Quantum Computation Model
The Theory Of The Quantum Computation Model
 
Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)
Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)
Introduction to cosmology and numerical cosmology (with the Cactus code) (1/2)
 
Commutative algebra
Commutative algebraCommutative algebra
Commutative algebra
 
Q.M.pptx
Q.M.pptxQ.M.pptx
Q.M.pptx
 
Wavelet Signal Processing
Wavelet Signal ProcessingWavelet Signal Processing
Wavelet Signal Processing
 
Parellelism in spectral methods
Parellelism in spectral methodsParellelism in spectral methods
Parellelism in spectral methods
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
 
Be2419772016
Be2419772016Be2419772016
Be2419772016
 
The Founding Principles Of The Fourier Series
The Founding Principles Of The Fourier SeriesThe Founding Principles Of The Fourier Series
The Founding Principles Of The Fourier Series
 
MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhang
 
A taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsA taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithms
 
G. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic Inflation
G. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic InflationG. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic Inflation
G. Djordjevic/ Lj. Nesic: Notes on Real and p-Adic Inflation
 
Ijetcas14 318
Ijetcas14 318Ijetcas14 318
Ijetcas14 318
 
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
D. Vulcanov: Symbolic Computation Methods in Cosmology and General Relativity...
 
The Lotka Volterra Predator Prey Model
The Lotka Volterra Predator Prey ModelThe Lotka Volterra Predator Prey Model
The Lotka Volterra Predator Prey Model
 

Último

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 

Último (20)

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 

Viterbi2

  • 1. 268 PROCEEDINGS OF THE IEEE, VOL. 61, NO. 3, MARCH 1973 [2] C. Mfiller, The TkoryofRelatioity. London, England: Oxford Univ. accelerated system of reference,” Phys. Reu., vol. 134,pp.A799- Press, 1952, pp. 274-276, 302-309. A804, 1964. [3] E. J. Post, Formal Structure of Elecfromagnetics. Amsterdam, The [9] T. C. Mo, “Theoryof electrodynamics in media in noninertial frames Netherlands: North-Holland, 1962, chs. 3, 6, 7. and applications,” J. Math. Phys., vol. 11, pp. 2589-2610, 1970. [4] A. Sommerfeld, Elektrodynamik. Wiesbaden, Germany: Die- [lo] R. M. Fano, L. J. Chu, and R. B. Adler, Electromagnetk Fields, En- terich’sche Verlag, 1948, pp. 281-289. ergy and Forces. New York: Wiley, 1960.pp. 407-416. [SI T.Schlomka, “Das Ohmsche Gesetz beibewegtenKorpern,” A n n . [ll] L.Robin, Fonctions Sphiriques de Legendre et Fonctions Sphtroidulcs, Phys., vol. 6, no. 8, pp. 246-252, 1951. vol. 3.Paris, France: Gauthier-Villars,1959,pp. 94-10. [6] H. Epheser and T. Schlomka, “Flachengrossen und elektrodyn- [12] W. R. Symthe, Static and Dynamic Elec#ricity. New York: Mc- amische Grenzbedingungen bei bewegten Korpern,” Ann. Phys., Graw-Hill, 1950, p. 208. vol. 6, no. 8, pp. 211-220, 1951. [13] M. Faraday, Experimental Researches in Electricity, vol. 3. New [7] J. L. Anderson and J. W. .Ryon,“Electromagneticradiation in York:Dover, 1965, pp. 362-368. accelerated systems,” Phys. Rev., vol. 181, pp. 1765-1775, 1969. [14] E . G.Cullwick, Electromagnetism and Relativity. London, England: [8] C. V. Heer, “Resonant frequencies of an electromagnetic cavity in an Longmans, pp. 36-141, 167. The Viterbi Algorithm G. DAVID FORNEY, JR. Invited Paper Abstrucf-The Viterbi algorithm (VA) is a recursive optimal solu- 11. STATEMENT OF THE PROBLEM tion to the problem of estimating the state sequence of a discrete- time finite-stateMarkovprocess observed in memoryless noise. VA In its most general form, the may be viewed as a solu- Many problems i areas such as digital communicationscan be cast n tion to theproblem of maximum a posteriori probability in this form. T i paper gives a tutorial exposition of the algorithm hs (MAP) estimation of the state sequence of a finite-state dis- and of how it is implemented and analyzed. Applications to date are crete-time Markov process observed in memoryless noise. I n reviewed. Increasing use of the algorithm in a widening variety of areas i foreseen. s this section we set up the problem in this generality, and then illustrate by example the different sorts problems that can of I. INTRODUCTION be made to such a model. The general approach fit also has the HE VITERBI algorithm (VA) was proposed in 1967 virtue of tutorial simplicity. TheunderlyingMarkovprocessischaracterized as fol- PI as a method of decoding convolutional codes. Since that time, it has been recognized an attractive solu- lows. Time is discrete. The statex k a t time k is one of a finite as tion to a variety of digital estimation problems, somewhat as number M of states m , 1 s m 5 14;i.e., the state space X is the Kalman filter has been adapteda to variety of analog esti- simply { 1, 2, . . , A I ] , Initiallyweshallassumethatthe mation problems. Like the Kalman filter, the VA tracks the process runs only from time 0 t o t i m e K and that the initial s t a t e of a stochastic process with a recursive method that is a n d final states x0 a n d XK are known; the state sequence is optimum in a certain sense, and that lends itself readily to then represented by a finite vector X = (20, + , x ~ ) We see 1 . implementation and analysis. However, the underlying pro- later that extension to infinite sequences is trivial. cess is assumed to be finite-state Markov rather than Gaus- T h e process is Markov, in the sense that the probability sian, which leads to marked differences in structure. P ( X k + l I x 9 x1, . * . , xk) of being in state 0 X k + l at time k f 1, This paper is intended to be principally a tutorial intro- given all states up to time k , depends only on t h e s t a t e xk at duction to theVA, its structure, and its analysis. It also pur-time k : ports to review more or less exhaustively all work inspired by P ( X k + l j xo, x1, . . . , Xk) = P(Xk+lI Xk). o r related to the algorithm up to the time writing (summer of 1972). Our belief is that the algorithmwill find application in The transition probabilities P ( x k + l ~ k ) may be time varying, x a n increasing diversity of areas. Our hope is t h a t we can ac- but we do not explicitly indicate this in the notation. celerate this process for the readers this paper. of I t is convenient to define the transition & at time k as t h e pair of states (xk+l, x k ) : This invited paper is one of a series planned on topics of general interest- tk (%+I, xk). The Editor. Manuscript received September 20, 1972;revised November 27, 1972. The author is with Codex Corporation, Newton, Mass. 02195. We let E bethe(possiblytime-varying)set of transitions
  • 2. FORNEY: THE VITERBI ALGORITFIM 269 “ h a SHIFT REGISTER DETERMINISTIC FUNCTION t.- Fig. 1. Most general model. MEWRYLESS CHANNEL & = (%&I, xk) for which P(xe11 x k ) ZO,and I X I their number. Clearly I X 1 5 M2. There is evidently a one-to-one correspon- dencebetweenstatesequences X andtransitionsequences Fig. 2. Shift-register model. f = (50, * * , &-I). (We write x= t.) Theprocess is assumedtobeobservedinmemoryless noise; t h a t is, there is a sequence z of observations z k in which I ’Ik , Zk depends probabilistically only on the transition E at time k:l k-0 In the parlance of information theory, z can be described as the output of some memoryless channel whose input sequence is(seeFig. 1). Again,though weshallnotindicateit ex- plicitly, the channel may be time varying in the sense that Fig. 3. A convolutionalencoder. P ( z k I b ) may be a function of k . This formulation subsumes t h e following as special cases. Finally, we state the problem to which the VA is a solu- 1) The case in which z k depends only on the state x k : tion. Given a sequence z of observations of a discrete-time finite-stateMarkov process memoryless in noise, find the state sequence x for which the a posteriori probability P(xl I) is maximum. Alternately, find the transition sequence F, for 2) The case in which 4 depends probabilistically on an which P(F,j z) is maximum (since x 2 E ) . In the shift-register output y k of the process at time k , where y k is in turn a de- model this is also the same as finding the most probable input terministic function of the transition b or the state ?&. sequence u, sincet&. X ; or also the most probable signal se- Example: T h e following model frequently arises in digital yL1 quence y, if X . I t is well known that this MAP rule mini- communications. There is an input sequence = (UO, u ul, * . ), mizes the error probability in detecting the whole sequence where each u k is generated independently according to some (the block-, message-, or word-error probability), and thus is probability distribution P ( i ( k ) and can take on one of a finite optimum in this sense. We shall see that in many applications number of values, say m. There is a noise-free signal sequence it is effectively optimum in any sense. y, not observable, in which eachy k is some deterministic func- Application Examples: We now give examples showingt h a t tion of the present and thev previous inputs: the problem statement above applies to a number of diverse - y k = f ( U k , * ’ * 7 Uk-v) fields,includingconvolutionalcoding,intersymbolinterfer- ence,continuous-phase frequency-shift keying(FSK),and The observed sequence z is the output of a memoryless chan- text recognition. The adequately motivated reader may skip nel whose input is y. We call such a process a shift-register immediately to the next section. process, since (as illustrated in Fig. 2) it can be modeled by a shift register of length Y with inputs u k . (Alternately, it is a A . Convolutional Codes vth-orderm-aryMarkovprocess.) T o completethecorre- A rate-l/n binary convolutional encoder is a shift-register spondence to our general model we define: circuit exactly like that of Fig. 2, where the inputs u k are in- 1) the state formationbitsandtheoutputs y k areblocks of # bits, y k x k & (uk-1, ‘ ‘ 3 uk-v) - =(Pa, * * , p,,~),each of which is a parity check on (modulo- 2 sum of) some subset of the v + l information bits (Uk, 2) the transition Uk-1, * . , Uk-v). When the encoded sequence (codeword) y is b & ( u k , * * ’ Uk--.). sentthroughamemorylesschannel, we have precisely the model of Fig. 2. Fig. 3 shows a particular rate-$ code with The number of states is thus 1 X =m’, and of transitions, v = 2 . (This codeis the only one ever used for illustration in the I IEl =m’+l. If theinputsequence“starts” at time 0 and VA coding literature, but the reader must not infer that it is ‘stops” at time K - v , i.e., the only one the VA can handle.) u = ( . . . 0, u0, u 1 , ’ ’ , ~ K - P 0, 0, ‘ * ’ ) , More general convolutional encoders exist: the rate may be k / n , the inputs may be nonbinary, and the encoder may then the shift-register process effectively starts at time and even contain feedback. In every case, however, the code may 0 ends at time K with x0 = XK = (0, 0, , 0). be taken to be generated by a shift-register process [2]. We might also note that other types of transmission codes (e.g., dc-free codes, run-length-limited codes, and others) can 1 The notation is appropriate when observations are discrete-valued; for continuous-valued Zk, simply substitute a density P ( z t I&) for the dis- be modeled as outputs of a finite-state machine and hence fall tribution P ( z t I&). into our general setup [49].
  • 3. 270 PROCEEDINGS OF THE IEEE, MARCH 1973 B . IntersymbolInterference In digital transmission through analog channels, we fre- quently encounter following the situation. input The se- J i - EJ - E Yh + quence u, discrete-time and discrete-valued as in the shift- register model, is used modulate some continuous waveform to h0 which is transmitted through a channel and then sampled. Fig. 4. Model of PAM system subject to intersymbol Ideally, samples z k would equal the correspondingU k , or some interference and white Gaussian noise. simple function thereof; in fact, however, the samples ,zk a r e perturbed both by noise and by neighboring inputs U p . T h e lattereffectiscalledintersymbolinterference.Sometimes intersymbolinterferenceisintroduceddeliberatelyforpur- poses of spectralshaping,inso-calledpartial-responsesys- tems. In such cases the output samples can often modeled as zk = yk + nk be L9-jw-g "k Fig. 5. SIGNAL Model for binarycontinuous-phase FSK withdeviationratio + "lk 3 and coherent detection in white Gaussian noise. where y k is a deterministic function of a finite number of in- puts, say, y k = f ( U k , * * . , Uk-y), a n d n k is a white Gaussian noise sequence. This is precisely Fig. 2. input sequenceu be binary and let ( 0 ) a n d w(1) be chosen so w To be still more specific, in pulse-amplitude modulation t h a t w ( 0 ) goesthroughanintegernumber of cyclesin T (PAM) the signal sequence y may be taken as the convolu- seconds and w(1) through an odd half-integer number; i.e., tion of the input sequenceu with some discrete-time channel w(O)T=Oandw(l)T=7rmodulo27r. Thenif e o = O , O1=Oorr, impulse-responsesequence (ho,hl, * * ) : accordingtowhether u equalszero or one,andsimilarly o & = O or r , according to whether an even or odd number of yk = hi24k-i. ones has been transmitted. i Here we have a two-state process, with X = (0, 7r 1. T h e If h, = O for i>v (finite impulse response), then we obtain our transmitted signal y k is a function of both the current input shift-register model. An illustration of such a model in which u k and the state x k : intersymbol interference spans three time units pears in Fig. 4. ( v = 2) ap- yk = cos [W(Uk)t + Xk] = cos cos w(Uk)f, It was shown in [29] that even problems where time is KT t < ( R + 1)T. actually continuous-Le., the received signal r(t) has the form Since transitions & = ( x k + l , % are one-to-one functions t h e $ of K - KT) + n(1) current state x k and input U k , we may alternately regard y k as r(t) = Ukh(1 being determined by ( k . k=O If we takeqo(t) &os w(0)t a n d vl(t) &os w(1)t as bases of for some impulse responseh ( t ) ,signaling interval T , and reali- t h e signal space, we may write zation n(t) of a white Gaussian noise process-can be reduced without loss of optimality to the aforementioned discrete-time y k = yOk'?O(l) ylkvl(t) + form (via a "whitened matched filter"). where the coordinates ( y o , ylk) are given by C. Continuous-Phase FSK This example is cited not for its practical importance, but because, first, it leads to a simple model we shall later use in an example, and, second, it shows how the VA may lead to I fresh insight even in the most traditional situations. ( ( 0 ,- I ) , if U k = 1 , Xk = In FSK, a digital input sequence u selects one of m fre- Finally, if the received signal ( ( t ) is q(t) plus white Gaussian quencies (if z& is m-ary) in each signaling interval length T ; of noise v ( t ) , then by correlating the received signal against both that is, the transmitted signal q(t) is q o ( t ) and ql(t) in each signal interval (coherent detection), we may arrive withoutloss of information a t a discrete-time out- put signal where O ( U k ) is the frequency selected by U k , and e k is some phase angle. It is desirable for reasons both of spectral shap- + z k = (ZW, Z l k ) = (yak, y l k ) (nokt flu) ing and of modulator simplicity that the phase be continuous where no a n d n1 are independent equal-variance white Gaus- at the transition interval; thatis, t h a t sian noise sequences. This model appears in Fig. 5, where the w(Uk-1)kT + ek-1 E w ( u k ) k T + e k modulo 27r. signal generator generates( y ~y a ) according to the aforemen- tioned rules. , This is called continuous-phase FSK. The continuity of the phase introduces memory into the D . TextRecognition modulation process; i.e., it makes the signal actually trans- VA is Ve include this example to show that the not limited mitted in the kth interval dependent on previous signals. T o to digital communication. optical-character-recognition In take the simplest possible case ("deviation ratio" =a), let t h e (OCR) readers, individual characters scanned, are salient
  • 4. FORNEY: THE VITERBI ALGORITHM MARKOV PROCESS CHARACTERS OCR OCR OUTPU REPRESENTING ENGLISH yI * DEVICE zI Fig. 6. Use of VA to improve character recognition by exploiting context. as features isolated, and some decision made to what letter or other character lies below the reader. When the characters actually occur as part of natural-language text, it has long been recognized that contextual information can be used to assist the reader in resolving ambiguities. One way of modeling contextual constraints is to treat a natural language like English as though it were a discrete- time Markov process. For instance, we can suppose that the probability of occurrence of eachletterdependsonthe v 01 IO 11 2K:::w previousletters,andestimatetheseprobabilitiesfromthe (b) frequencies of (vf1)-letter combinations[(v+l)-grams]. Whilesuchmodelsdonotfullydescribethegeneration of Fig. 7. (a) State diagram of a four-state shift-register process. (b) Trellis f o r a four-state shift-register process. natural language(for examples of digram and trigram English, see Shannon [45], [46]), they account for a large part of t h e statistical dependencies language are in the and easily instant of time. The trellis begins and endsthe knownstates at handled. x. and XK. Its most important property is that every possi- to With such a model, English letters are viewed as the out- ble.state sequence x there corresponds a unique path through puts of a n m’-state Markov process, wherem is the number of the trellis, and vice versa. distinguishable characters, such as 27 (the 26 letters and a Now we show how, given a sequence of observations I, space). If it is further assumed that the OCR output zk is every path may be assigned a “length” proportional to -In dependent only on the corresponding input character then P ( x , z), where x is the state sequence associated with that yk, the OCR reader isa memoryless channel to whose output se- path. This will allow us t o solve the problem of finding t h e quencewemayapplytheVA to exploitcontextualcon- state sequence for whichP(xl z) is maximum, or equivalently straints; see Fig. 6. Here the ‘‘OCR output” may be anything for which P ( x , z) = P ( x i z ) P ( z ) is maximum, by finding the a from the raw sensor data, possibly grid of zeros and ones, to path whose length --in P ( x , Z) is minimum, since In P ( x ,t) is the actual decision which would be made by the reader in the a monotonic function of P ( x , Z) and there isa one-to-one cor- absence of contextual clues. Generally, the more raw the data, respondencebetweenpathsandsequences.Wesimply ob- the more useful the VA will be. servethatduetotheMarkovandmemorylessproperties, P ( x , z) factors as follows: E . Othw Recognizing certain similarities between magnetic record- P(x, 2) = 1 P(x)P(z x ) ing media and intersymbol interference channels, Kobayashi [SO] has proposed applying the VA to digital magnetic re- cording systems. Timor [SI] has applied the algorithm to a sequentialrangingsystem.Use of thealgorithminsource Hence if weassigneachbranch(transition)the ‘‘length” coding has been proposed [52]. Finally, Preparata and Ray [53] have suggested using the algorithm to search ‘‘semantic maps”insyntacticpatternrecognition.Theseexhaustthe applications known to the author. then the total length of the path corresponding to some x is 111. THEALGORITHM We now show that the MAP sequence estimation problem k=O previously stated is formally identical to the problem find- of as claimed. ing the shortest route through a certain graph. The VA then Finding the shortest route througha graph is an old prob- arises as a natural recursive solution. lem in operations research. The most succinct solution was Ye areaccustomedtoassociatingwith a discrete-time givenbyMintyin a quarter-pagecorrespondencein 1957 finite-state Markov process a state diagramof the type shown [ 5 5 ] , which we quote almost in its entirety: in Fig. 7(a), for a four-state shift-register process like that of . . . as follows: Build problem .model of the travel network, Fig. 3 or Fig. 4 (in this case, a de Bruijn diagram [54]). Here The shortest-route . . can be solved verysimply a string ~K~des representstates,branchesrepresenttransitions,and where knots representcities and string lengths represnt dis- over the course of time the process traces some path from state tances (or costs). Seize the knot ‘Los Angeles”in your left to state the diagram. throughstate hand and knot the ’Boston” in your right and pull them apart.
  • 5. 272 PROCEEDINGS OF THE IEEE, MARCH 1973 Unfortunately, the Minty algorithm is not well adapted to modern methods of machine computation, nor are assis- tants as pliable as formerly. I t therefore becomes necessary to move on to the VA, which is also well known in operations 1 1 research [ 5 6 ] . I t requires one additional observation. 1 We denote by xok a segment (xo, xl, . , xk) consisting of the states to time k of the state sequence x = (xo, x 1 , * , x k ) . I n t h etrellis xok corresponds to a path segment starting t h e at node x0 and terminatingat Xk. For any particular time-k node %kt there will in general be several such path segments, each with some length k-1 UXOk> = 6 0 m>. The shortest such path segment is called the surriuor corre- sponding to the node x k , and is denoted ? ( X k ) . For any time k>O, there are M survivors in all, one for each k . The obser- x vation is this: the shortest complete path i must begin with one of these survivors. (If it did not, but went through state 7' Z k at time k , thenwecouldreplaceitsinitialsegmentby f ( x k ) to get a shorter path-contradiction.) T h u s at any time k we need remember only the M sur- vivors f(xk) and their lengths l'(xk) p X [ f ( x k ) ] . To get to time k f l , we need only extend all time-k survivors by one time of unit, compute the lengths the extended path segments, and for each node xk+l select the shortest extended path segment terminatingin xk+1 as thecorrespondingtime-(kfl)sur- vivor. Recursion proceeds indefinitely without the number of survivors ever exceedingM . r3 Many readers will recognizethisalgorithm as a simple [%I. version of forward dynamic programming [57], By this or anyothername,thealgorithmiselementaryoncethe problem has been cast in the shortest route form. k=5 We illustrate the algorithm for a simple four-state trellis covering 5 timeunitsinFig. 8. Fig. 8(a)showsthecom- plete trellis, with each branch labeled with a length. (In a real (b) application, the lengths would be functions of the received Fig. 8. (a) Trellis labeled with branch lengths; M = 4, K = 5. data.)Fig.8(b)showsthe 5 recursivestepsbywhichthe (b) Recursive determination of the shortest path via the VA. algorithm determines the shortest path from the initial to the final node. At each stage only t h e 4 (or fewer) survivors are shown, along with their lengths. for each x k + l ; store r ( x r + l ) and the corresponding survivor A formal statement of the algorithm follows: i(Xk+l). S e t k to k+1 and repeat untilk = K . Viterbi Algorithm With finite state sequences x the algorithm terminates at time K with the shortest complete path stored the survivor as Storage: ?(xK). (time ) k ; ?(xk), 1< x k < & f (survivor terminating in xk): Certain trivialmodifications necessary practice. are in r(xk), 1<xk <M (survivor length). When state sequences are very long or infinite, it is necessary to truncate survivors to some manageable length 6. In other Initialization: words,thealgorithmmustcometo a definitedecision o n k = 0.; nodes up to time k-6 at time k. Note that in Fig. 8(b) all time-4 survivors go through the same nodes up t o time 2. In X(xo) = x o ; i(m) arbitrary, m # xo; general, if thetruncationdepth 6 ischosenlargeenough, r(Xo) 0 ; = r(m) = Q), m # xo. there is a high probability that all time-k survivors will go through the same nodes up to time k - 6 , so that the initial Recursion: Compute segment of the maximum-likelihood path is known up to time k-6 and can be put out as the algorithm's firm decision; in this case, truncation costs nothing. In the rare cases when for all = ( x k + l , xk). survivors disagree, any reasonable strategy for determining Find thealgorithm's time-(k-6)decision will work [20], [21]: choose an arbitrary time-(k-6) node, or the node associated with the shortest survivor, or node chosen by majority vote, a
  • 6. FORKEY: THE VITERBI ALGORITHM 273 etc. If 6 is large enough, the effect on performance is negligi- ble. Also, if k becomes large, it is necessary to renormalize the lengths r ( m ) from time to time by subtracting a constant from all of them. Fig. 9. Typical cell of a shift-register process trellis. Finally, the algorithm may be required get started with- to out knowledge of the initial state xg. In this case it may be initializedwithanyreasonableassignment of initialnode lengths, such as r(m)=0, all m, or else r(m)= -In r,,,if t h e 4 II states are known to have a priori probabilities r,,,. Usually, after an initial transient, there is a high probability that all survivors will merge with the correct path. Thus the algorithm synchronizes itself without any special procedures. The complexity of the algorithm is easily estimated. First, memory: the algorithm requires M storage locations, one for each state, where each location must be capable of storing a / ,, i(Uii,-. II 1, &r ,, i(Cii,-. ll 9+ 9 SELECT SELECT SELECT SELECT 1-OF- 2 I-OF-2 I-OF-2 1-OF-2 “length” r ( m ) and a truncatedsurvivorlisting - % ( m ) of 6 symbols. Second, computation: each in unit algorithm must make IEI additions, one for each transition, of timethe ! {; * and M comparisons among the ] E l results. Thus the amount - y - - - - l TO ANOTHER LOGIC UNIT TOANOTHER LOGIC UNIT of storage is proportional to the number of states, and the amount of computation to the numberof transitions. ‘ith a Fig, 10. Basic VA logic unit for binary shift-register process. shift-register process, M = my and 1 El = mV+l, that the com- so plexity increases exponentially with the length v of the shift Fourier transform (FFT). In fact, it is identical, except for register. length, and indeed the FFT is also ordinarily organized cell- In the previous paragraph, have ignored the complexity wise. Li‘hile the add-and-compare computations of the we VA involvedingeneratingtheincrementallengths X(&). In a are unlike those involved in the FFT, some of the memory- shift-register process, it is typically true that P ( x k + l I xk) is organization tricks developed for the FFT may be expected either l/m or 0, depending on whether xk+l is an allowable suc- to be equally useful here. cessor t o xk or not; then all allowable transitions have the Because of its highly parallel structure and need for only same value of -In P(xk+lI xk) and this component of X(&) add, compare, and select operations, the VA is well suited to may be ignored. Note that in more general cases P(xk+l I xk) is high-speed applications. A convolutional decoder for a v = 6 known in advance: hence this component can be precomputed code (Ji= 64, / E l = 128) t h a t is built out of 356 transistor- a n d ‘wired in.” The component -In P ( z k / 4;c) is the only com- transistor logiccircuitsandthatcanoperate at u pt o 2 ponentthatdepends on the data; again, it is typical that Mbits/s perhaps represents the current state of the art [22]. many f k lead to the same output y k , andhencethevalue Even a software decoder for a similar code can be run at a -In P ( z k 1 yk) need be computed or looked up for all these .$x. rate order the of 1000 bits/s on a minicomputer. Such only once, givenz k . (Then the noise is Gaussian, -In P ( z k I y k ) moderate complexity qualifies the VA for inclusion in many is proportional simply to ( z k - y k ) * . ) Finally, all of this can be signal-processing systems. done outside the central recursion (“pipelined”). Hence the For further details of implementation, see [22], [23], a n d complexity of this computation tends not to be significant. the references therein. Furthermore, note that once the X&) are computed, the observation Zk need not be stored further, an attractive fea- IV. ASALYSIS PERFORMAXE OF ture in real-time applications. Just as important as thestraightforwardness of imple- A closer lookat t h e trellis of a shift-register process reveals mentation of t h e VA is the straightforwardness with which its additional detail that can be exploited in implementation. Forperformance can be analyzed. In many cases, tight upper and a binary shift register, the transitions in any unit time can lower bounds for error probability can be derived. Even when of be segregated into 2y-1 disjoint groups of four, each originat- t h e VA is not actually implemented, calculation of its per- ing in a common pair of states and terminating in another formanceshowshowfartheperformance of lesscomplex common pair. A typical such cell is illustrated in Fig. 9, with schemes is from ideal, and often suggests simple suboptimum the time-k states labeled and x‘l and the time-(k+l) statesschemes that attain nearly optimal performance. x’0 labelled Ox‘ a n d lx’, where x’ stands fora sequence of v - 1 bits Thekeyconceptinperformanceanalysisisthat of a n that is constant within any one cell. For example, each time errorevent.Let x be the actual state sequence, and 4 the unit in the trellis of Fig. 8 is made up of two such cells. Ye state sequence actually chosen by the VA. Over a long time x note that only quantities within the same interact in any and i will typically diverge and remerge a number of times, cell onerecursion.Fig. 10 shows a basiclogic m i t that imple- as illustrated in Fig. 11. Each distinct separation is called a n mentsthecomputationwithinanyone cell. A high-speed errorevent.Erroreventsmayingeneralbe of unbounded parallel implementation of the algorithm can be built around length if x is infinite, but the probability of a n infinite error 2-l such identical logic units; a low-speed implementation, event will usually be zero. aroundonesuchunit,time-shared; or a softwareversion, The importance of erroreventsisthattheyareprob- around the corresponding subroutine. abilistically independent of one another; in the language of Many readers will have noticed that the trellis Fig. 7(b) probability theory they are of recurrent. Furthermore, they allow reminds them of the computational flow diagram of the fast us to calculate error probability per unit time, which neces- is
  • 7. 274 PROCEEDINGS OF THE IEEE, u 1973 r n k.0 LrK - u w - W Fig. 11. Typical correct path X (heavy line) and estimated path 2 (lighter line) in the trellis, showing three error events. sary since usually the probability any error in MAP estima- of Fig. 12. Typical correct path X (heavy line) and time-k incorrect tion of a block of length K goes t o 1 as K goes to infinity. subset for trellis of Fig. 7 b . () Instead we calculate the probabilityof an error event starting at some given time, given that the starting state is correct, i.e., that an error event is not already in progress at that time. Given the correct path X , the set &k of all possible error ... . .. events startingat some time k is a treelike trellis which starts at x k and each of whose branches ends on the correct path, as illustratedinFig. 12, for the trellis of Fig.7(b).Incoding Fig. 13. Trellis for continuous-phase FSK. theory this is called the incorrect subset (at time ) . R The probability of any particular error event is easily cal- culated; it is simply the probability that the observations will be such that over the time span during which 4 is different from X , f is more likely thanX . If the error event has length T, this is simplya two-hypothesis decision problem between two sequences of length T, and typically has a standard solution. Fig. 14. Typical incorrect subset. Heavy line: correct p t . ah The probability P ( & k ) t h a t any error event in occurs can &k Lighter lines: incorrect subset. then be upper-bounded, usually tightly, by a union bound, i.e., by the sum of the probabilities of all error events in &k. While this sum may well be infinite, it is typically dominated a typical incorrect subset in Fig. 14. The shortest possible by one or a few large leading terms representing particularly error event is of length 2 and consists of a decision t h a t t h e likely error events, whose sum then forms a good approxima- signalwas{cos w(l)t, -cos o(1)t) rather than {cos w(O)t, tion to P(&k).True upper bounds can often be obtained by cos w ( 0 ) t 1, or in our coordinate notation that(0, (0, - 1) ] { l), flow-graph techniques [4], [SI, [%I. is chosen over { (1, 0),(1, 0) 1. This is a two-hypothesis de- On the other hand, a lowerbound to error-event prob- cisionproblemin a four-dimensionalsignalspace [59]. I n ability,againfrequentlytight,canbeobtainedbyagenie Gaussian noise of variance u* per dimension, only the Euclid- argument. Take the particular error event that has the great- ean distance d between the two signals matters; in this case- est probability of all those in &k. Suppose thata friendly genie d = 4 3 , and therefore the probability of this particular error tells you that the true state sequence is one of two possibili- eventis Q(d/%)=Q(l/ad/Z), where Q(x) istheGaussian ties: the actual correct path, the incorrect path correspond- error probability function defined by or ing to that error event. Even with this side information, you will still make an error if the incorrect path ismorelikely given z, so your probability of error is still no better than the probability of this particular error event. In the absencethe of genie, your error probability must be worse still, since one of By examination of Fig. 14 we see that the error events of the strategies you have, given the genie’s information, is to lengths 3, 4, . . lie at distances 4 6 , dG, . from the ignore it. In summary, the probability of any particular error correct path; hence we arrive at upper and lower bounds on event isa lower bound toP ( & k ) . P ( & k ) of An important side observation is that this lower bound VA. applies to any decision scheme, not just the Simple exten- + Q ( 4 2 / 2 u ) 2 P(&k) 5 Q ( d 2 / 2 ~ ) Q(d6/eu) sions give lower bounds to related quantities like bit prob- + + Q(4i6/2u) . ..* ability of error. If the VA givesperformanceapproaching these bounds, then it may be claimed that it is effectively opti- I n view of the rapid decreaseof Q ( x ) with x , this implies that mum with respect to these related quantities as well [3O]. P ( & k ) is accurately estimated as Q(d/2/2u) for any reasonable In conclusion, the probability of any error event starting noise variance a*. I t is easily verified from Fig. 13 that this at time k may be upper- and lower-bounded as follows: result is independentof the correct pathx or the timeK. This result is interesting in that the best one can do with max P(error event) 5 P(&k) 5 max P(error event) coherent detection in a single symbol interval is Q(1/2a) for + other terms. orthogonal signals. Thus exploiting the memory doubles the effective signal energy, or improves the signal-to-noise ratio With luck these bounds will be close. by 3 dB. It may therefore be claimed that continuous-phase FSK is inherently 3 dB better than noncontinuous for devia- Example tion ratio 3, or as good as antipodal phase-shift keying. (While For concreteness, and because the resultis instructive, we we have proved this only a deviation ratio of ), it holds for for carry through the calculation for continuous-phase FSK of nearly any deviation ratio.) Even though the is quite sim- VA theparticularlysimpletype definedearlier.Thetwo-state ple for a two-state trellis, the fact that only one type error of trellis for this process is shown in Fig. and the first part of event has any significant probability permits still simpler sub- 13,
  • 8. FORNEY: THE VITERBI ALGORITIIM 215 optimum schemes to achieve effectively the same performance formance superior to all other coding schemes save sequential [431,[441.’ decoding,anddoesthis at highspeeds,withmodestcom- plexity,andwithconsiderablerobustnessagainstvarying V. APPLICATIONS channelparameters. A number of prototypesystemshave We conclude with a review of the results, both theoretical been implemented and tested [21]-[26], some quite original, and practical, obtained with the VA to date. and it seems likely that Viterbi decoders become common will in space communication systems. A , ConvolutionalCodes Finally, Viterbi decoders have been used as elements in It was for convolutional codes that the algorithm was first very-high-performance concatenated coding schemes [lo]- developed, and naturally it has had its greatest impact here. [12], [27] andindecoding of convolutionalcodesinthe The principal theoretical result is contained in Viterbi’s presence of intersymbol interference [35], [6O]. original paper [ l ] ;see also [SI-[7]. I t shows that for a suita- bly defined ensemble of random trellis codes and MAP decod- B . IntersymbolInterference ing, the error probability can be made to decrease exponen- Application of the VA to intersymbol interference prob- tially with the constraint length a t all code rates R less t h a n lems i s more recent, and the main achievements have been v channel capacity. Furthermore, the rate of decrease is con- theoretical. The principal result, for PAM in white Gaussian siderably faster than for block codes with comparable decod- noise, is t h a t P ( & k ) can be tightly bounded as follows: ing complexity (although the same as that for block codes with the same decoding delay). In our view, the most telling K ~ Q ( d m i n / 2I & k ) I ~ )P ( KrQ(drnirJ2~) comparison between block and convolutional codes is that an where K L and Kv are small constants, Q(x) is the Gaussian effectively optimum block code of a n y specified length and error probability function defined earlier, u? is t h e noise vari- rate can be created by suitably terminating a convolutional ance, and dmin is the minimum Euclidean distance between (trellis)code, but with a lowerratefortheblockcode of any two distinct signals[29]. This result implies that on most course [7]. In fact, if convolutional codes were any better, channels intersymbol interference need not lead to any sig- they could be terminated to yield better block codes than are nificant degradation in performance, which comes as rather a theoretically possible-an observation which shows that the surprise. bounds on convolutional code performance must be tight. Forexample,withthemostcommonpartialresponse For fixed binaryconvolutionalcodes of nonasymptotic systems, the VA recovers the 3-dB loss sustained by conven- length on symmetric memoryless channels, principal the tional detectors relative to full-response systems [29],[32]. result [6] is that P ( & k ) is approximately given by Simplesuboptimumprocessors [29],[33] candonearly as P(&k) ;r,2-dD well. Several workers [34],[35],[42] have proposed adaptive where d is the free distance, Le., the minimum Hamming dis- versions of the algorithm unknown time-varying for and tance of any path in the incorrect subset &h from the correct channels. Ungerboeck [41], [42] and hlackechnie [35] path: N d is the number of such paths; and D is the Bhat- have shown that only matched filter rather thana whitened a tacharyya distance matched filter is needed in PAM. D = logs P ( z O)l’*P(z 1)l I 2 1 I I t seems most likely that the greatest effect of the VA on digital modulation systems will be t o reveal those instances in Z which conventional detection techniques significantly fall z in where the sum is over all outputs the channel output spaceshort of optimum, to and suggest effective suboptimum 2. methods of closing the gap. PAM channels that cannot be On Gaussian channels linearly equalized without excessive noise enhancement due t o nulls or near nulls in the transmission band are the likeliest candidates for nonlinear techniques of this kind. where & / N O is the signal-to-noise ratio per information bit. The tightness of this bound is confirmed by simulations [8], C. Text Recognition [9], [22]. For a v = 6 , d = 10, R=+ code, for example, error An experiment was run in which digram statistics were probabilities o l e * , lC5,a n d le7 achieved at Eb/No used t o correct garbled text produced by a simulated noisy f are =3.0, 4.3, a n d 5.5 dB,respectively,which is within a few characterrecognizer [47]. Resultsweresimilartothose of tenths of a dceibel of this bound [22]. Raviv [48] (using digram statistics), although the algorithm Channels available space for communications fre-are was simpler and only had hard decisions rather than confi- quently accurately modeled as white Gaussian channels. The dence levels t o work with. It appears that the algorithm may VA attractive such is for channels because gives it per- be a useful adjunct t o sophisticated character-recognition sys- tems resolving for ambiguities when confidence levels for * De Buda [44] actually proves thatanoptimum decision onthe different characters are available. phase x t at time k can be made by examining the received waveform only at times k- 1 and k ; Le., ( z o . ~ - I z ~ t - l ) (ZU, w ) . The proof is that the log , , VI. CONCLUSIOK likelihood ratio P ( Z 0 . k - 1 , Z1.k-I, ZOk, Zlk I Xk-1, Xk 0, The VA has already hadsignificant impact on our under- a -In standing of certain problems, notably in the theories convo- of P ( Z 0 . t - 1 , Z1.t-1, Z O t , &t-1, Xk = *, Y t + d lutional codes and of intersymbol interference. I t is beginning is proportional to - z ~ , t - l + z ~ . ~ - 1 - z u - -for any values of the pair of ~~ states ( ~ l t - 1 ~ xk+1). For this phase decision (which differs slightly from our to have a substantial practical impact as well in the engineer- sequence decision) the error probability is exactly Q(&/2n). ing of space-communication links. The amount work it has of
  • 9. 276 PROCEEDINGS OF THE IEEE, MARCH 1973 inspired in the intersymbol interference area suggests that Now we note the recursive formula here too practical applications are not far off. The generality of the model t o which it applies and the straightforwardness p(%,zOL-') = p ( z k ,k - 1 , x ZO"') Zk-1 with which it can be analyzed and implemented lead one to believe that in both theory and practice it will find increasing = p ( z k - 1 ,O " ' ) p ( z k ,- 1 Z 4 1 zk-1) zk-1 application in the years ahead. M which allows us to calculate the quantities P(%, ~0"') from APPENDIX the M quantities P ( x ~ 1zo"*) with IZI multiplications and , RELATED ALGORITHMS additions using the exponentiated lengths In this Appendix we mention some processing structures that are closely related t o t h eVA, chiefly sequential decoding = p ( z k %k-l)P(zk-l I &-I). I and minimum-bit-error-probability algorithms. We also men- Similarly we have the backward recursion tion extensions of the algorithm to generate reliability infor- mation, erasures, and lists. p(Zk" xk) = 1 p(zkK,x k + l Z k ) I ZW1 When the trellis becomes large, it is natural to abandon the exhaustive search of the VA in favorof a sequential trial- = p ( z kx k + l , xk)p(zk+l" I Zk+d I Zkil and-error search that selectively examines only those paths likely t o be the shortest. In the coding literature, such al- which has a similar complexity. Completion of these forward gorithms are collectively known as sequential decoding [Is]- and backward recursions for all nodes allowsP ( x k , Z) and/or [l6]. The simplest to explain is the 'stack" algorithm [IS], P(&,z) t o be calculated for all nodes. [16], which a list is maintainedof the shortest partial paths in Now, to be specific, let us consider a shift-register process found to date, the path on the top the list is extended, and and let S(uk) be the setof all states zel whose first component of its successors reordered in the list until some path is found is %k. Then t h a t reachesthe &eLminalnode, or elsedecreaseswithout p ( u k , 2) = p ( x k + l , 2)- limit. (That some pathwill eventually do so is ensured in cod- Zk+lES(Ut) ing applications by the subtraction of a bias term such that the length of the correct path tends to decrease while thatof Since P(uk, z) =P(ukI z ) P ( z ) , MAP estimation of u k reduces all incorrect paths tends to increase.) Searches that start from o findingthemaximum t of thisquantity.Similarly, if we either end of a finite trellis [I71 are alsouseful. wish t o find the MAP estimate of an output y k , say, then let In coding applications, sequential decoding has many of S(p) be the setof all & that leadto yk and compute the same properties as Viterbi decoding, including the same error probability. I t allows the decoding of longer and there- P(Yk, 4 = p ( f k , 2). €t= k t ) fore more powerful codes, at the cost of a variable amount of computationnecessitatingbufferstoragefortheincoming A similar procedure can be used to estimate any quantity d a t a z. It is probably less useful outside of coding, since i t which isa deterministic functionof states or transitions. depends on the decoder's ability to recognize when the best Besidesrequiringmultiplications,thisalgorithmisless pathhasbeenfoundwithoutexaminingotherpaths,and attractive than the VA in requiring a backward as well as a therefore requires eithera finite trellisor a very large distance forward recursion and consequently storage of all data. The between the correct path and possible error events. following amended algorithm [39], [48] eliminates the latter Intheintersymbolinterferenceliterature,many of the ugly feature at the cost of suboptimal performance and addi- find us early attempts to optimum nonlinear algorithms used bit- tional computation. Let restrict ourselves to shift-register a errorprobability as theoptimalitycriterion.TheMarkov process with input sequence u and agree to use only observa- property of the process leads to algorithms that are manage- tions up to time k+6 in estimating U k , say, where 6 2v- 1. W e able but less attractive than the Viterbi [36]-[42]. then have The general principle of several of these algorithms is as P ( u k , 20") = ' * * p(Ukk*, ZOw"). follows.8 First, we calculate the joint probability P ( x k , z) for uti1 Uti6 every state x k in the trellis, or alternately P(&,z) for every transition &. This is done by observing that Thequantities in thesumcan be determinedrecursively by I p ( z k , 2) = p ( x k , ZO"')p(ZkK x k , 20"') = = p(xk, I ZOk1)p(zkK xk) p ( U k k M , ZOk*) UL-1 p(Uk-lk", ZOkM) since, given Xk, the outputs zkK from time k t o K are indepen- = p(uk-lkul, ZOku' 1 d e n t of the outputs zoL-' from time 0 t o K - 1 . Similarly ut-1 p(uk+d, &+d I W k Y l , ' * uk+6-r). While the recursion is now forward only, we now must store m' quantities rather than m'. If I is large, this is most un- * attractive; if 6 is close t o v-1, the estimate may well be de- cidedly suboptimum. These variations thus seem considerably less attractive 'The author is indebted to Bahl et al. 1181 foraparticularly lucid t h a n t h e VA. Nonetheless, something like this may need to exposition of this type of algorithm. be hybridized with the VA in certain situations suchtrack-as
  • 10. FORNEY: THE VITERBI ALGORITHM 277 ing a finite-state source overa finite-state channel, where only space communication,” I E E E Trans. Commun. Technol., vol. COM- 19, pp. 835-847, Oct. 1971. the state sequence the source is interest. of of [23] G. C. Clark, Jr., and R. C. Davis, “Two recent applications error- of Finally, we can consider augmented outputs from the VA. correction coding to communications system design,” I E E E Trans. A good general indication of how well the algorithm is doing [24] Commun. Technol., vol. COM-19, pp. 856-863,TDMA satellite com- I. M. Jacobs and R. J. Sims, “Configuring a Oct. 1971. is the depth which all paths are merged; this can be to at used munication system with coding, in Proc. 5th Hawaii Int. Conf. Sys- establish whether or not a communications channel is on the tems Science. Honolulu, Hawaii:WesternPeriodicals, 1972, pp. 443-446. air, in synchronism, etc. l l ] ,[28]. A more selective indicator 1251 A. R. Cohen, J. A. Heller, andA. J. Viterbi, “A new coding technique [ of howreliableparticularsegmentsare is the difference in forasynchronousmultiple access communication,” I E E E Trans. lengths between the best and the next-best paths the point at Commun. Technol., vol. COM-19, pp. 849-855, Oct. 1971. (261 D. Quagliato,“Errorcorrecting codes applied to satellitechan- of merging; this reliability indicator can be quantized into an nels,= in 1972 I E E E Int. Conf. Communications (Philadelphia, Pa.), erasure output. Lastly, the algorithm can be altered to store pp. 15/13-18. t h e L best paths, rather than the single best path, as the sur- [27] Linkabit NAS2-6722,coding Ames Res. Cen., Rep. Field, Calif., Contract Corp.,“Hybrid NASA systemstudy,” Final Moffett on vivors in each recursion, thus eventually generating a list of NASA Rep. CR114486, Sept. 1972. t h e L most likely path sequences. [281 G. C. Clark, Jr., and R. C. Davis, “Reliability-of-decodingindicators for maximum likelihood decoders,” in Proc. 5th Hawaii I n f . Conf. SystemsScience. Honolulu, Hawaii: Western Periodicals, 1927, pp. REFERENCES 447-450. Conuolutwnal Codes Intersymbol Interference (291 G. D.Forney,Jr.,“Maximum-likelihoodsequenceestimation of [l] A. J. Viterbi, “Error bounds for convolutional codes and an asymp- digital sequences in the presence of intersynlbol interference,” E E E I toticallyoptimumdecoding algorithm,” I E E E Trans. Inform. Trans. Inform. Theory, vol. IT-18, pp. 363-378, May 1972. Theory, vol. IT-13, pp. 260-269, Apr. 1967. [30] -, “Lower bounds on error probability in the presence of large 121 G. D.Forney,Jr.,“Convolutional codes I: Algebraic structure,” intersymbol interference,” I E E E Trans.Commun. Technol. (Cor- I E E E Trans. Inform. Theory, vol. IT-16, pp. 720-738, Nov. 1970. resp.), vol. COM-IO, pp. 76-77, Feb. 1972. [3] - , “Review of randomtreecodes,” NASA Ames Res. Cen., [31] J. K. Omura, optimum ”On receivers forchannels inter- with Moffett Field, Calif., Contract NAS2-3637, NASA CR 73176, Final symbol interference,” abstract presented at the IEEE Int. Symp. Rep., Dec. 1967, appendix A. Information Theory, Noordwijk, The Netherlands, June 1970. [4] G. C. Clark, R. C. Davis,J. C. Herndon, and D. D. McRae, “Interim [32] H. Koba;ashi, ”Correlative level codingand maximum-likelihood report on convolution coding research,” Advanced System Opera- decoding, I E E E Trans.Inform.Theory, vol. IT-17, PP. 586-594, tion, Radiation Inc., Melbourne, Fla., Memo Rep.38, Sept. 1969. Sept. 1971. (5) A. J. Viterbi and J. P. Odenwalder, “Further results on optimal de- (331 M. J. Ferguson, ”Optimal receptionfor binary partial response chan- coding of convolutional codes,” I E E E Trans. Inform. Theory (Cor- nels,” Bell Syst. Tech. J . , vol. 51, pp. 493-505, Feb. 1972. resp.), vol. IT-15, pp. 732-734, Nov. 1969. (341 F. R. Magee, Jr., and J. G. Proakis, “Adaptive maximum-likelihood [a] A. J. Viterbi, “Convolutional codes and their performance in com- sequence estimationfor digital signaling in thepresence of intersym- municationsystems,” I E E E Trans. Commun. Technol., COM-19,vol. bo1 interference,” I E E E Trans. Inform.Theory (Corresp.), vol. pp. 751-772, Oct. 1971. IT-19, pp. 120-124, Jan. 1973. [7] G. D.Forney,Jr.,“Convolutional codes 11: Maximum likelihood (351 L. K. Mackechnie, ”Maximum likelihood receivers for channels hav- decoding,’! Stanford Electronics Labs., Stanford, Calif., Tech. Rep. ing memory,’ Ph.D. dissertation, Dep. Elec. Eng., Univ. of Notre 7004-1, June 1972. Dame, Notre Dame, Ind., Jan. 1973. 181 J. A. Heller, “Short constraint length convolutional codes,” in Space J. [36] R. W. Chang and C. Hancock, “Onreceiver structures for channels Program Summary 37-54, vol. 111. Jet Propulsion Lab., Calif. Inst. having memory,’ I E E E Trans. Inform. Theory, vol. IT-12, pp. 463- Technol., pp. 171-177, 0ct.-Nov. 1968. 468, Oct. 1966. (91 - , “Improved performance of short constraint length convolu- [37] K.Abend,T. J. Harley,Jr., B. D.Fritchman,and C. Gumacos, tional codes,” in Space Program Summary37-56,vol. 111. Jet Propul- “On optimum receivers for channels having memory,”I E E E Trans. sion Lab., Calif. Inst. Technol., pp. 83-84, Feb.-Mar. 1969. Inform. Theory (Corresp.), vol. IT-14, pp. 819-820, Nov. 1968. [lo] J. P. Odenwalder, “Optimal decoding of convolutional codes,” Ph.D. [38] R. R. ?wen, “Bayesian decision procedures for interfering digital dissertation,Dep.Syst. Sci., Sch.Eng. Appl. Sci., Univ. of Cali- signals, I E E E Trans. Inform. Theory (Corresp.), vol. IT-15, pp. fornia, Los Angeles, 1970. 506-507, July 1969. [Ill J. L. Ramsey,“Cascadedtreecodes,” M I T Res. Lab.Electron., [39] K. Abend and B. D. Fritchman, “Statistical detection for communi- Cambridge, Mass., Tech. Rep. 478, Sept. 1970. cationchannelswithintersymbolinterference,” Proc. I E E E , vol. [12] G. W. Zeoli, “Coupled decodingof block-convolutional concatenated 58, pp. 779-785, May 1970. codes,”Ph.D.dissertation,Dep. Elec. Eng.,Univ. of California, (401 C. G. Hilborn, Jr., “Applications of unsupervised learning to prob- Los Angeles, 1971. lems of digital communication,” in Proc. 9th I E E E Symp. Adaptive [I31 J. M. Wozencraft,“Sequential decoding for reliable communica- Processes, Decision, and Control (Dec. 7-9, 1970). tion,” in 1957 I R E Nat. Conv. Rec., vol. 5, pt. 2, pp. 11-25. C. G. Hilborn, Jr., andD.G.Lainiotis,“Optimalunsupervised [I41 R. M. Fano,“A heuristic discussion of probabilistic decoding,”I E E E learning multicategory dependent hypotheses pattern recognition,” Trans. Inform. Theory,vol. IT-9, pp. 64-74, Apr. 1963. I E E E Trans. Inform. Theory, vol. IT-14, pp. 468-470, May 1968. 1151 K. S. Zigangirov, “Some sequentialdecodingprocedures,” Probl. [41] G. Ungerboeck, “Nonlinear equalization of binary signals in Gaus- Pered. Inform., vol. 2, pp. 13-25, 1966. sian noise,”I E E E Trans. Commun. Technol., vol. COM-19, pp. 1128- [161 F. Jelinek, “Fast sequential decoding algorithmusing a stack,” I B M 1137, Dec. 1971. J . Res. Develop., vol. 13, pp. 675-685, Nov. 1969. [42] --, ”Adaptive maximum-likelihood receiver for carrier-modulated [171 L. R. Bahl, C. D. Cullum, W. D. Frazer, and F. Jelinek, ”An effi- data transmission systems,” in preparation. cient algorithm for computing free distance,” IEEE Trans. Inform. Theory (Corresp.), vol. IT-18, pp. 437-439, May 1972. Continuous-Phase FSK [ l a ] L. R. Bahl, J. Cocke,F. Jelinek, and J. Raviv, “Optimal decodingof [43] M. G. Pelchat, R. C. Davis, and M . B. Luntz, “Coherent demodula- linear codes for minimizing symbol error rate’ (Abstract), in 1972 tion of continuous-phase binary FSK signals,” inProc. Inl. Telemetry In#. Symp. InformationTheory (Pacific Grove, Calif., Jan.1972), Conf. (Washington, D. C., 1971). p. 90. [44] R. de Buda, “Coherent demodulation of frequency-shift keying with 1191 P. L. McAdam, L. R. Welch, and C. L. Weber, “M.A.P. bit decoding low deviation ratio,”I E E E Trans. Commun. Technol.,vol. COM-20, of convolutional codes,” in 1972 I n f . Symp. Informafion Theory pp. 429-435, June 1972. (Pacific Grove, Calif., Jan. 1972), p. 91. Text Recognition [45] C. E. Shannon and W. Weaver, The MUthemdica! Theory of Com- Space Applications munication. Urbana, Ill.: Univ. of Ill.Press, 1949. [20] Linkabit Corp., “Coding systems study for high data rate telemetry [46] C. E. Shannon, “Prediction and entropy of printed English,” Bell links, ’Final Rep. on N4SA Ames Res. Cen., Moffett Field, Calif., Syst. Tech. J . , vol. 30, pp. 50-64, Jan. 1951. Contract NASZ-6024, Rep. CR-114278, 1970. (471 D. L. Neuhoff, “The Viterbi algorithm as aid in textrecognition,’ an [21] G. C. Clark, “Implementation of maximum likelihood decoders for Stanford Electronic Labs., Stanford, Calif., unpublished. convolutional codes,” in Proc. Int. Telcmetering Conf. (Washington, (481 J. Raviv, “Decision making in Markov chains applied to problem the D. C., 1971). of pattern recognition,” I E E E Trans.Inform.Theory, vol. IT-13, [22] J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite and pp. 536-551, Oct. 1967.
  • 11. 218 PROCEEDINGS OF THE IEEE, MARCH 1973 Miscellaneous [54] S. W. Golomb, Shift Regisln Sequcrrccs. San Franasm, Calif.: Holden-Day, 1967, pp..13-17. [49] H.Kobayashi,“Asurvey of codingschemesfortransmissionor [ 5 5 ] G. J. Minty, ‘ comment on the shortest-route problem,” O p a . A recording o digital data,” IEEETrans.Commun.Technol., f vol. Res., vol. 5 , p. 724, Oct. 1957. COM-19, pp. 1087-1100, Dec. 1971. [56] M. Pollack andW. Wiebenson,“Solutions of the shortest-route [SO] -, “Application of probabilistic decoding t o digital magnetic re- problem-A review,” O p a . Res., vol. 8, pp. 224-230, Mar. 1960. cordingsystems,” I B M J . Res. Develop., vol. 15, pp. 64-74, Jan. 1971. [57] R. Busacker and T. Saaty, Finite Graphs and Networks: A n Z n t r o - duction with Applications. New York: McGraw-Hill, 1965. [51] U. Tirnor, ‘Sequential ranging with the Viterbi algorithm,” Jet Pro- pulsion Lab., Pasadena, Calif., JPL Tech. Rep. 32-1526, vol. 11, pp. [58] J. K. Omura, “On the Viterbi decoding algorithm,” ZEEE Trans. 75-79, Jan. 1971. Inform. T h m y , vol. IT-15, pp. 177-179, Jan. 1969. [52] J.K.Omura, “On the Viterbi algorithm forsourcecoding”(Ab- [59] J. M. Wozencraft and I. M. Jacobs, Principres of Cmnunicaliorr stract), in1972 ZEEE Znt. Symp. Information Theory (Pacific Grove, Engineering. New York: Wiley, 1965, ch. 4. Calif., Jan. 1972), p. 21. [a] K. Omura, ‘Optimal receiver design for convolutional codes and J. [53] F. P. Preparata and S. R. Ray, “An approach to artificial nonsym- channelswithmemory via control theoreticalconcepts,” Inform. bolic cognition,”Inform. Sci.,vol. 4, pp. 65-86, Jan. 1972. Sci., vol. 3, pp. 24S-266, July 1971.