SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
The Sum-Product Algorithm
        PRML 8.4.4

         Xuebin Ma
•   Factor graph

    •   undirected tree, directed tree, ploy tree (F8.43)

•   Goal:

    •   Obtain an efficient, exact inference algorithm for
        finding marginals

    •   Compute efficiently where several marginals are
        require
den. Later we shall see how to modify p(x) algorithm to incorporate evidenc
                                p(x) =
                                           the                   (8.61)
 onding to observed variables. By definition, the marginal is obtained by sum
         •  x denotes the set of variables in x with variable x omitted. The idea is
                                       xx
he joint distribution over all variables except x so that node x
             Calculate marginals for particular variable
   where x
   to substitute for p(x) using the factor graph expression (8.59) and then interchange
                                    p(x) =
   summations and products in order to obtain p(x)   an efficient algorithm. Consider the (8.61
   fragment of graph shown in Figure 8.46xin which we see that8.61 tree structure of
                                                 x
                                                                      F
                                                                         the
   the graph allows us to partitiondistribution overthe variables except x into groups, with
                       sum the joint the factors in all joint distribution
x  x group associated with each of the factor x with variable x omitted. The idea
   one denotes the set of variables in nodes that is a neighbour of the variable
          •
   node x. We see using the factor graph expression (8.59)factorsform
titute for p(x)Joint distribution in form a production of andthe interchang
                    that the joint distribution can be written as a product of then
tions and products in order to obtain san Xs )
                                        p(x) =        F (x, efficient algorithm. Consider th
                                                                                    (8.62)
nt 404 graph shown in Figure 8.46ne(x) which we see 8.62 the tree structure
    of      8. GRAPHICAL MODELS
                                                s∈ in               F that
ph ne(x) denotes the set of factor nodes that in the joint distribution into groups, wi
    allows us to partition the factors are neighbours of x, and X denotes the
                                                                             s
upset of all variables in the subtree connected to the variable node x via the factor node
     associated evaluation of the marginal p(x).
        Figure 8.46 with each of graph illustrating the
                    A fragment of a factor the factor nodes that is a neighbour of the variab

  We see that the joint distribution can be written as a product of the form
                                                                    µ   (x)
                                                                    fs →x
                                                 Fs (x, Xs )

                           p(x) =              Fs (x, Xs )     fs           x
                                                                                        (8.62
                                     s∈ne(x)

denotes the set of factor nodes that are neighbours of x, and Xs denotes th
ll variables in theand Fs(x, Xs)connected to theall the factors innode x associated factor nod
                 fs , subtree represents the product of variable the group via the
x, Xs ) represents the product of all the factors in the group associated
g.
fs(8.62) into (8.61) and interchanging the sums and products, we ob-
       •
uting (8.62) into and product
           sum (8.61) and interchanging the sums and products, we ob-
           p(x) =                    Fs (x, Xs )
              p(x) = ne(x)
                   s∈           Xs        Fs (x, Xs )
                                                             F 8.61, F8.62 -> F8.63
                          s∈ne(x) Xs
                 =           µfs →x (x).                            (8.63)
                      = ne(x)
                      s∈            µfs →x (x).                          (8.63)
                           s∈ne(x)
ntroduced a set of   functions µfs →x (x),
                                         defined by
ve introduced a set of functions µfs →x (x), defined by
             µfs →x (x) ≡      Fs (x, Xs )                          (8.64)
                 µfs →x (x)Xs
                            ≡       Fs (x, Xs ) F 8.64                   (8.64)
                                  Xs
 iewed as messages8.63 message from factor node to to variable node x x.
                    F  from the factor nodes fs fx the variable node
be viewed marginal p(x)from the by the nodes fsof all the incoming x.
  required as messages is given factor product to the variable node
 ng atrequired marginalproductis given by the product of all the incoming
  the node x.       F 8.64
                           p(x) of all incoming messages arriving at node x
 riving at these x.
 evaluate node messages, we again turn to Figure 8.46 and note that
rx, Xs ) is describedmessages, we again turnandFigure 8.46 and fac- that
   to evaluate these by a factor (sub-)graph to so can itself be note
 cular, we can write
 Fs (x, Xs ) is described by a factor (sub-)graph and so can itself be fac-
particular, we, can, write (x , X ) . . . G (x , X )
 ) = f (x, x . . . x )G
      s      1        M    1    1    s1       M     M   sM             (8.65)
 enience,fwe have .denoted)G1 variables )associated with factor fx , in
 Xs ) = s (x, x1 , . . , xM the (x1 , Xs1 . . . GM (xM , XsM )        (8.65)
messages arriving at node x. a set of functions µfs →x (x), defined by
        Here we have introduced
       In order to evaluate these messages, we again s
                                                    f turn to Figure 8.46 and note that
        •   Evaluate is describeds →x (x)
                                   µf by a factor Fs (x, Xs )
  each factor Fs (x, Xs )these messages≡ (sub-)graph and so can x
  torized. In particular, we can write          Xs         µfs →x (x)
                                                                                     (8.64)
                                                                          itself be fac-

       which can ) =viewedxas . . . , xM )G1 (x1the s1 ) . . . nodes fM to the variable node x.
        Fs (x, Xs be fs (x, 1 , messagesxfrom , Xfactor GM (x s , XsM )                          (8.65)
                                                    m
       We see that the required marginal p(x) is given by the product                 Models         405
                                                          8.4. Inference in Graphicalof all the incoming
 where, for convenience, we have denoted the X )
       messages arriving at node x. Gm (xm , variables associated with factor fx , in
         Figure 8.47 x , . . . x the factorization of the    sm as-       x
 addition to x, by Illustration, of Mthese messages, subgraphillustratedFigure 8.468.47. note that
                                      . This factorizationagain turn M in Figure and Note
            In order sociated with factor node fs .
                      to1 evaluate                        we    is          to
                                                                                     µx →f (xM )
 that the set of variables {x,is 1 , . . . , xM byis the set (sub-)graph and so can itself be fac-
       each factor Fs (x, Xs ) xdescribed } a factor of variables on which the factor    M   s


 fs depends, andparticular,alsocan denoted xs , using the notation of (8.59).
       torized. In so it can we be write
                                                                                   fs
 denotes the set of(8.65) into (8.64)that are neighbours of the factor node
      Substituting variable nodes we obtain                                                            x
              Fs (x, Xs ) = fs (x, x1 , . . . , xM )G1 (x1 , Xs1 ) . . . GM (xM , XsM )→x (x) (8.65)
s)  x denotes the same set but with node x removed. Here we have                        µf  s


ollowing messages from. variable have denoted the nodes xassociated (x , factor) fx , in
   µ where, for convenience, we(x, x , . to ,factor variables m
         (x) =        . .    f nodes . . x )                    G with X
     fs →x                              s        1       M                        m m     sm
        addition to x, by x1 , . . . , xM . This factorization is illustrated in Figure 8.47. Note
                       x1      xM                            m∈ne(fs )Gm (xmxm sm )
                                                                      x X , X
        that µxmset of variables {x,Gm (x. , , Xsm ).the set of variables(8.67)
             the →fs (xm ) ≡              x1 , . . m xM } is                     on which the factor
                 =        . . Xsm be , . . . , x x)
        fs depends, and so.it can fs (x, x1denotedM s , usingF 8.67 xm →fsof (8.59). (8.66)
                                      also                         the notation (xm )
                                                                        µ
             Substituting (8.65))into (8.64)set ofobtainm∈ne(fthat are neighbours of the factor node
                     where ne(f Mdenotes the
                       x1     x              we variable nodes s )x
                                  s
efore introduced two, distinctskinds of message, those that go from factor Here we have
                   fs and ne(f )  x denotes the same set but with node x removed.
        µfs denoted= f →x (x), andfmessages from x from variable nodes to
able nodes→x (x) defined the .following those 1 , . . .go M ) nodes to factor nodes (xm , Xsm )
                     µ        ..       s (x, x that , variable                 Gm
denoted µx→f (x). In each case, we see that )messages(x s )x Xxm a
                         x1       xM     µx →f (xm ≡         Gm m
                                                                 passed along
                                                            m∈ne(f , Xsm ).               (8.67)
ys a function of the variable associated with the variable node that link
                                                 m   s


                        =                  fs (x, x1 , . . . , xM )            µxm →fs (xm )
                                                          X
                                  ...                             sm
                                                                                                      (8.66)
                    We have 1
                            x therefore introduced two distinct kinds ne(message, those that go from factor
                                     xM                           m∈ of fs )x
t (8.66) says that to evaluate thenodes denoted µf →x (x),factor node to a vari-
                    nodes to variable message sent by a and those that go from variable nodes to
ng the link connecting them,denoted µx→f (x). In of the incoming messagespassed along a
                    factor nodes take the product each case, we see that messages
                    link are always a function of the variable associated with the variable node that link
always a function of the variable associated with the variable node that link
 s to.
e result (8.66) says that to evaluate the message sent by a factor node to a vari-
     •
de along the link connecting them, take the product of thefrom variable to factor
 CAL MODELS
           Evaluate messages from messages incoming messages
ll other linksusing sub-graph factorization the factor associated
           by coming into the factor node, multiply by
at node, and then marginalize over all of the variables associated with the
ng406 of the8. GRAPHICAL MODELS sent by a It fL important to note that
    messages. evaluationillustrated in Figure 8.47. is
stration        This is of the message
 able node to an adjacent factor node.
  node can send a message to a variable node once it has received incoming
          Figure 8.48 Illustration of variable nodes.
es from all other neighbouring the evaluation of the message sent by a  fL
                      variable node to an adjacent factor node.
 ally, we derive an expression for evaluating the messages from variable nodes
 r nodes, again by making use of the (sub-)graph factorization. From Fig- s
                                                                       xm        f
8, we see that term Gm (xm , Xsm ) associated with node xm is given by a
                                                                                     fs
  of terms Fl (xm , Xml ) each associated with one of the factor nodes fl that is xm
o node xm (excluding node fs ), so that                         fl
                                                                               fl
                                                       Fl (xm , Xml )
             Gm (xm , Xsm ) =                   Fl (xm , Xml )               Fl (xm(8.68)
                                                                                   , Xml )
                                 l∈ne(xm )fs
                                                            F 8.68
 n obtain
 he product is taken overobtain
                     then all neighbours of node xm      except for node fs .Xm except for node fs
                                                           product of node Note
ch of the factors Fl (xm , Xml ) represents a subtree of the original graph of
 y the same µxm →fs (xm ) = in xm →fs (xm ) = Fl (xm (8.68)) into l(8.67),ml )
             kind as introduced µ (8.62). Substituting , Xml      F (xm , X we
                                 l∈ne(xm )fs    Xml   l∈ne(xm )fs    Xml


                            =                       =
                                                µfl →xm (xm )         µfl →xm (xm ) (8.69)       (8.69)
                                                       l∈ne(xm )fs
                                 l∈ne(xm )fs
                                                                               F 8.67 + F 8.68 -> F 8.69
                    where we have used the definition (8.64) of the messages passed from factor nodes to
ere we have used the definition (8.64) of the messages passed from factor nodes to
from (8.66) that the message sent should take the form

     •    Message send by leaf(variable fnode = f (x)factor node)
                                       µ →x (x) and                                                        (8.71)

         Figure 8.49   The sum-product algorithm              µx→f (x) = 1             µf →x (x) = f (x)
                       begins with messages sent
                       by the leaf nodes, which de-
                       pend on whether the leaf           x                   f        f                   x
                       node is (a) a variable node,
                                                                   (a)                        (b)
                       or (b) a factor node.

     •    Find marginals for every variable node introduced by John-san

08   •    Sum-product algorithm
          8. GRAPHICAL MODELS

     Figure 8.50   The sum-product algorithm can be viewed
                   purely in terms of messages sent out by factor
                   nodes to other factor nodes. In this example,
                   the outgoing message shown by the blue arrow
                   is obtained by taking the product of all the in-      x1
                   coming messages shown by green arrows, mul-
                   tiplying by the factor fs , and marginalizing over                                x3
                   the variables x1 and x2 .                             x2       fs




                   and indeed the notion of one node having a special status was introduced only as a
•    Normalization Inference in Graphical Models
                   8.4. (undirected graph)                                                        409

      • totoget normalization coefficient 1/Z p(x) = p~(x)/Z
graph used illustrate the x               x 1              x               2                           3

      • use sum-product to findfunnormalized marginals for xi
 orithm.
                                                  f
      • coefficient 1/Z can be obtained by normalizing the marginal
                                                           a                          b



      •                                     f
         efficient as calculated only over one single variable                  c
                                                               8.4. Inference in Graphical Models          409

        Figure 8.51   A simple factor graph used to illustrate the   x1                x2                  x3
                      sum-product algorithm.
                                                                               fa                 fb
                                                                          x4
                                                                                            fc


nnormalized joint distribution is given by
                                                                                       x4
          p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ).                                        (8.73)
                                                                     F 8.73
                      graph whose unnormalized joint distribution is given by
                                                                         Unnormalized joint distributions
ply the sum-product algorithm to this graph, let us designate node x3
which case there are two leaf nodes fa1 1 , x2 )fb (x.2 ,Startingxwith the leaf (8.73)
                              p(x) = x(xand x
                                                   4
                                                         x3 )fc (x2 , 4 ).
410         8.4.8. GRAPHICAL MODELS
                              Inference in Graphical Models                        409            p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ).                (8.73)

 r graph used to illustrate the
                     x1            x1             x2       x2         In order to3apply thex
                                                                          x3 x              sum-product algorithm to this graph, let us x
                                                                                                                 x2                     designate node x3
                                                                                             1                                           3
algorithm.                                                           as the root, in which case there are two leaf nodes x1 and x4 . Starting with the leaf
                                                 fa
                                                                     nodes, we then have the following sequence of six messages
                                                                     f   b

                                                                                                µx1 →fa (x1 ) = 1                                                (8.74)
                                                                fc
                                                                                                µfa →x2 (x2 ) =             fa (x1 , x2 )                        (8.75)
                                                                                                                       x1
                                                                                                µx4 →fc (x4 ) = 1                                                (8.76)
                                                           x4                                   µfc →x2 (x2 ) =             fc (x2 , x4 )                        (8.77)
                                                                                                                       x4
                                                  x4                                                                x4
                                                                                                µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 )                       (8.78)
 unnormalized joint distribution is given by
                                         (a)                                                    µ       (x ) =      (b) , x )µ
                                                                                                                    f (x               .                         (8.79)
                                                                                                  fb →x3    3                b   2    3     x2 →fb
          p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ).         (8.73)                            x2
            Figure 8.52 Flow of messages for the sum-product algorithm applied to the example graph in Figure 8.51. (a)
 pply the sum-productleaf nodesto 1 and x4 towards theThe direction 3 . (b) From the messages istowards the leaf nodes. Once this mes-
            From the algorithm x this graph, let us designate node xx of flow of these root node illustrated in Figure 8.52.
                                                           root node
                                                                        3
n which case there are two leaf nodes x1 and x4 . Startingsage propagation is complete, we can then propagate messages from the root node
                                                           with the leaf
en have the following sequence of six messages            out to the leaf nodes, and these are given by
                                             One message has now passed in each direction across each link, and we can now
         µx1 →fa (x1 ) = 1                   evaluate the marginals. As a simplex3 →fb (x3 ) = verify that the marginal p(x2 ) is (8.80)
                                                                       (8.74)   µ check, let us 1
         µfa →x2 (x2 ) =               fa (x1given by the correct expression. Using→x2 (x2 )and substitutingxfor the messages using (8.81)
                                              , x2 )                   (8.75)   µfb (8.63) =          fb (x2 , 3 )
                                  x1         the above results, we have                           x3

         µx4 →fc (x4 ) = 1                                                        (8.76)      µx2 →fa (x2 ) = µfb →x2 (x2 )µfc →x2 (x2 )                         (8.82)
                                                         p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 )
         µfc →x2 (x2 ) =               fc (x2 , x4 )                              (8.77)      µfa →x1 (x1 ) =             fa (x1 , x2 )µx2 →fa (x2 )             (8.83)
                                                                                                                     x2
                                                                     =             fa (x1 , xµ)             fb (x2 ,µ 3 )             fc (x2 , x4 )
                                  x4
                                                                                                                    x
                                                                                                x2 →fc (x2 )   =      fa →x2 (x2 )µfb →x2 (x2 )
                                                                                             2                                                                   (8.84)
         µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 )                               (8.78)
                                                                              x1                       x3                        x4
         µfb →x3 (x3 ) =               fb (x2 , x3 )µx2 →fb .                     (8.79)      µfc →x4 (x4 ) =             fc (x2 , x4 )µx2 →fc (x2 ).            (8.85)
                                  x2
                                                                     =                       fa (x1 , x2 )fb (x2 , xx2)fc (x2 , x4 )
                                                                                                                    3
                                                                             x1    x2   x4
n of flow of these messages is illustrated in Figure 8.52. Once this mes-
                                                         =
ation is complete, we can then propagate messages from the root node p(x)                                                                               (8.86)
 f nodes, and these are given by                              x1 x3 x4

       µx3 →fb (x3 ) = 1                     as required.                         (8.80)
af nodes x1 and x4 towards the root node x3 . (b) From the root node towards the leaf nodes.


           One message has now passed in each direction across each link, and we can now
           evaluate the marginals. As a simple check, let us verify that the marginal p(x2 ) is
          •given by the correct expression. Usingcalculated
               Marginal p(x2) can be (8.63) and substituting for the messages using
           the above results, we have

                    p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 )

                             =           fa (x1 , x2 )           fb (x2 , x3 )          fc (x2 , x4 )
                                    x1                      x3                     x4

                             =                     fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 )
                                  x1     x2   x4

                             =                     p(x)                                                 (8.86)
                                  x1     x3   x4

           as required.
                So far, we have assumed that all of the variables in the graph are hidden. In most
           practical applications, a subset of the variables will be observed, and we wish to cal-
           culate posterior distributions conditioned on these observations. Observed nodes are
           easily handled within the sum-product algorithm as follows. Suppose we partition x
           into hidden variables h and observed variables v, and that the observed value of v
           is denoted v. Then we simply multiply the joint distribution p(x) by i I(vi , vi ),
                                                references @n_shuyo product corresponds
           where I(v, v) = 1 if v = v and I(v, v) = 0 otherwise. This @sleepy_yoshi @nokuno
           to p(h, v = v) and hence is an unnormalized version of p(h|v = v). By run-
           ning the sum-product algorithm, we can efficiently calculate the posterior marginals
           p(hi |v = v) up to a normalization coefficient whose value can be found efficiently

Mais conteúdo relacionado

Mais procurados

Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
krishna_093
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
Phong Vo
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3
Phong Vo
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
Gilles Louppe
 

Mais procurados (20)

Nl eqn lab
Nl eqn labNl eqn lab
Nl eqn lab
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Applied numerical methods lec9
Applied numerical methods lec9Applied numerical methods lec9
Applied numerical methods lec9
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Matlab ploting
Matlab plotingMatlab ploting
Matlab ploting
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
Session 6
Session 6Session 6
Session 6
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)
 
Newton divided difference interpolation
Newton divided difference interpolationNewton divided difference interpolation
Newton divided difference interpolation
 
Pc12 sol c03_cp
Pc12 sol c03_cpPc12 sol c03_cp
Pc12 sol c03_cp
 
Langrange Interpolation Polynomials
Langrange Interpolation PolynomialsLangrange Interpolation Polynomials
Langrange Interpolation Polynomials
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Fourier series
Fourier seriesFourier series
Fourier series
 

Semelhante a Prml8.4.4

this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...
BhojRajAdhikari5
 
Module 2 lesson 4 notes
Module 2 lesson 4 notesModule 2 lesson 4 notes
Module 2 lesson 4 notes
toni dimella
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
lokeshkumer
 

Semelhante a Prml8.4.4 (20)

02-Random Variables.ppt
02-Random Variables.ppt02-Random Variables.ppt
02-Random Variables.ppt
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...
 
Module 2 lesson 4 notes
Module 2 lesson 4 notesModule 2 lesson 4 notes
Module 2 lesson 4 notes
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
 
Moment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsMoment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of Distributions
 
IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED-V2I5P56
IJSRED-V2I5P56
 
Reformulation of Nash Equilibrium with an Application to Interchangeability
Reformulation of Nash Equilibrium with an Application to InterchangeabilityReformulation of Nash Equilibrium with an Application to Interchangeability
Reformulation of Nash Equilibrium with an Application to Interchangeability
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 
Boolean Matching in Logic Synthesis
Boolean Matching in Logic SynthesisBoolean Matching in Logic Synthesis
Boolean Matching in Logic Synthesis
 
Numarical values
Numarical valuesNumarical values
Numarical values
 
Numarical values highlighted
Numarical values highlightedNumarical values highlighted
Numarical values highlighted
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
 
@ R reference
@ R reference@ R reference
@ R reference
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
1807.02591v3.pdf
1807.02591v3.pdf1807.02591v3.pdf
1807.02591v3.pdf
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Random variable, distributive function lect3a.ppt
Random variable, distributive function lect3a.pptRandom variable, distributive function lect3a.ppt
Random variable, distributive function lect3a.ppt
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 

Último

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Prml8.4.4

  • 1. The Sum-Product Algorithm PRML 8.4.4 Xuebin Ma
  • 2. Factor graph • undirected tree, directed tree, ploy tree (F8.43) • Goal: • Obtain an efficient, exact inference algorithm for finding marginals • Compute efficiently where several marginals are require
  • 3. den. Later we shall see how to modify p(x) algorithm to incorporate evidenc p(x) = the (8.61) onding to observed variables. By definition, the marginal is obtained by sum • x denotes the set of variables in x with variable x omitted. The idea is xx he joint distribution over all variables except x so that node x Calculate marginals for particular variable where x to substitute for p(x) using the factor graph expression (8.59) and then interchange p(x) = summations and products in order to obtain p(x) an efficient algorithm. Consider the (8.61 fragment of graph shown in Figure 8.46xin which we see that8.61 tree structure of x F the the graph allows us to partitiondistribution overthe variables except x into groups, with sum the joint the factors in all joint distribution x x group associated with each of the factor x with variable x omitted. The idea one denotes the set of variables in nodes that is a neighbour of the variable • node x. We see using the factor graph expression (8.59)factorsform titute for p(x)Joint distribution in form a production of andthe interchang that the joint distribution can be written as a product of then tions and products in order to obtain san Xs ) p(x) = F (x, efficient algorithm. Consider th (8.62) nt 404 graph shown in Figure 8.46ne(x) which we see 8.62 the tree structure of 8. GRAPHICAL MODELS s∈ in F that ph ne(x) denotes the set of factor nodes that in the joint distribution into groups, wi allows us to partition the factors are neighbours of x, and X denotes the s upset of all variables in the subtree connected to the variable node x via the factor node associated evaluation of the marginal p(x). Figure 8.46 with each of graph illustrating the A fragment of a factor the factor nodes that is a neighbour of the variab We see that the joint distribution can be written as a product of the form µ (x) fs →x Fs (x, Xs ) p(x) = Fs (x, Xs ) fs x (8.62 s∈ne(x) denotes the set of factor nodes that are neighbours of x, and Xs denotes th ll variables in theand Fs(x, Xs)connected to theall the factors innode x associated factor nod fs , subtree represents the product of variable the group via the
  • 4. x, Xs ) represents the product of all the factors in the group associated g. fs(8.62) into (8.61) and interchanging the sums and products, we ob- • uting (8.62) into and product sum (8.61) and interchanging the sums and products, we ob- p(x) = Fs (x, Xs ) p(x) = ne(x) s∈ Xs Fs (x, Xs ) F 8.61, F8.62 -> F8.63 s∈ne(x) Xs = µfs →x (x). (8.63) = ne(x) s∈ µfs →x (x). (8.63) s∈ne(x) ntroduced a set of functions µfs →x (x), defined by ve introduced a set of functions µfs →x (x), defined by µfs →x (x) ≡ Fs (x, Xs ) (8.64) µfs →x (x)Xs ≡ Fs (x, Xs ) F 8.64 (8.64) Xs iewed as messages8.63 message from factor node to to variable node x x. F from the factor nodes fs fx the variable node be viewed marginal p(x)from the by the nodes fsof all the incoming x. required as messages is given factor product to the variable node ng atrequired marginalproductis given by the product of all the incoming the node x. F 8.64 p(x) of all incoming messages arriving at node x riving at these x. evaluate node messages, we again turn to Figure 8.46 and note that rx, Xs ) is describedmessages, we again turnandFigure 8.46 and fac- that to evaluate these by a factor (sub-)graph to so can itself be note cular, we can write Fs (x, Xs ) is described by a factor (sub-)graph and so can itself be fac- particular, we, can, write (x , X ) . . . G (x , X ) ) = f (x, x . . . x )G s 1 M 1 1 s1 M M sM (8.65) enience,fwe have .denoted)G1 variables )associated with factor fx , in Xs ) = s (x, x1 , . . , xM the (x1 , Xs1 . . . GM (xM , XsM ) (8.65)
  • 5. messages arriving at node x. a set of functions µfs →x (x), defined by Here we have introduced In order to evaluate these messages, we again s f turn to Figure 8.46 and note that • Evaluate is describeds →x (x) µf by a factor Fs (x, Xs ) each factor Fs (x, Xs )these messages≡ (sub-)graph and so can x torized. In particular, we can write Xs µfs →x (x) (8.64) itself be fac- which can ) =viewedxas . . . , xM )G1 (x1the s1 ) . . . nodes fM to the variable node x. Fs (x, Xs be fs (x, 1 , messagesxfrom , Xfactor GM (x s , XsM ) (8.65) m We see that the required marginal p(x) is given by the product Models 405 8.4. Inference in Graphicalof all the incoming where, for convenience, we have denoted the X ) messages arriving at node x. Gm (xm , variables associated with factor fx , in Figure 8.47 x , . . . x the factorization of the sm as- x addition to x, by Illustration, of Mthese messages, subgraphillustratedFigure 8.468.47. note that . This factorizationagain turn M in Figure and Note In order sociated with factor node fs . to1 evaluate we is to µx →f (xM ) that the set of variables {x,is 1 , . . . , xM byis the set (sub-)graph and so can itself be fac- each factor Fs (x, Xs ) xdescribed } a factor of variables on which the factor M s fs depends, andparticular,alsocan denoted xs , using the notation of (8.59). torized. In so it can we be write fs denotes the set of(8.65) into (8.64)that are neighbours of the factor node Substituting variable nodes we obtain x Fs (x, Xs ) = fs (x, x1 , . . . , xM )G1 (x1 , Xs1 ) . . . GM (xM , XsM )→x (x) (8.65) s) x denotes the same set but with node x removed. Here we have µf s ollowing messages from. variable have denoted the nodes xassociated (x , factor) fx , in µ where, for convenience, we(x, x , . to ,factor variables m (x) = . . f nodes . . x ) G with X fs →x s 1 M m m sm addition to x, by x1 , . . . , xM . This factorization is illustrated in Figure 8.47. Note x1 xM m∈ne(fs )Gm (xmxm sm ) x X , X that µxmset of variables {x,Gm (x. , , Xsm ).the set of variables(8.67) the →fs (xm ) ≡ x1 , . . m xM } is on which the factor = . . Xsm be , . . . , x x) fs depends, and so.it can fs (x, x1denotedM s , usingF 8.67 xm →fsof (8.59). (8.66) also the notation (xm ) µ Substituting (8.65))into (8.64)set ofobtainm∈ne(fthat are neighbours of the factor node where ne(f Mdenotes the x1 x we variable nodes s )x s efore introduced two, distinctskinds of message, those that go from factor Here we have fs and ne(f ) x denotes the same set but with node x removed. µfs denoted= f →x (x), andfmessages from x from variable nodes to able nodes→x (x) defined the .following those 1 , . . .go M ) nodes to factor nodes (xm , Xsm ) µ .. s (x, x that , variable Gm denoted µx→f (x). In each case, we see that )messages(x s )x Xxm a x1 xM µx →f (xm ≡ Gm m passed along m∈ne(f , Xsm ). (8.67) ys a function of the variable associated with the variable node that link m s = fs (x, x1 , . . . , xM ) µxm →fs (xm ) X ... sm (8.66) We have 1 x therefore introduced two distinct kinds ne(message, those that go from factor xM m∈ of fs )x t (8.66) says that to evaluate thenodes denoted µf →x (x),factor node to a vari- nodes to variable message sent by a and those that go from variable nodes to ng the link connecting them,denoted µx→f (x). In of the incoming messagespassed along a factor nodes take the product each case, we see that messages link are always a function of the variable associated with the variable node that link
  • 6. always a function of the variable associated with the variable node that link s to. e result (8.66) says that to evaluate the message sent by a factor node to a vari- • de along the link connecting them, take the product of thefrom variable to factor CAL MODELS Evaluate messages from messages incoming messages ll other linksusing sub-graph factorization the factor associated by coming into the factor node, multiply by at node, and then marginalize over all of the variables associated with the ng406 of the8. GRAPHICAL MODELS sent by a It fL important to note that messages. evaluationillustrated in Figure 8.47. is stration This is of the message able node to an adjacent factor node. node can send a message to a variable node once it has received incoming Figure 8.48 Illustration of variable nodes. es from all other neighbouring the evaluation of the message sent by a fL variable node to an adjacent factor node. ally, we derive an expression for evaluating the messages from variable nodes r nodes, again by making use of the (sub-)graph factorization. From Fig- s xm f 8, we see that term Gm (xm , Xsm ) associated with node xm is given by a fs of terms Fl (xm , Xml ) each associated with one of the factor nodes fl that is xm o node xm (excluding node fs ), so that fl fl Fl (xm , Xml ) Gm (xm , Xsm ) = Fl (xm , Xml ) Fl (xm(8.68) , Xml ) l∈ne(xm )fs F 8.68 n obtain he product is taken overobtain then all neighbours of node xm except for node fs .Xm except for node fs product of node Note ch of the factors Fl (xm , Xml ) represents a subtree of the original graph of y the same µxm →fs (xm ) = in xm →fs (xm ) = Fl (xm (8.68)) into l(8.67),ml ) kind as introduced µ (8.62). Substituting , Xml F (xm , X we l∈ne(xm )fs Xml l∈ne(xm )fs Xml = = µfl →xm (xm ) µfl →xm (xm ) (8.69) (8.69) l∈ne(xm )fs l∈ne(xm )fs F 8.67 + F 8.68 -> F 8.69 where we have used the definition (8.64) of the messages passed from factor nodes to ere we have used the definition (8.64) of the messages passed from factor nodes to
  • 7. from (8.66) that the message sent should take the form • Message send by leaf(variable fnode = f (x)factor node) µ →x (x) and (8.71) Figure 8.49 The sum-product algorithm µx→f (x) = 1 µf →x (x) = f (x) begins with messages sent by the leaf nodes, which de- pend on whether the leaf x f f x node is (a) a variable node, (a) (b) or (b) a factor node. • Find marginals for every variable node introduced by John-san 08 • Sum-product algorithm 8. GRAPHICAL MODELS Figure 8.50 The sum-product algorithm can be viewed purely in terms of messages sent out by factor nodes to other factor nodes. In this example, the outgoing message shown by the blue arrow is obtained by taking the product of all the in- x1 coming messages shown by green arrows, mul- tiplying by the factor fs , and marginalizing over x3 the variables x1 and x2 . x2 fs and indeed the notion of one node having a special status was introduced only as a
  • 8. Normalization Inference in Graphical Models 8.4. (undirected graph) 409 • totoget normalization coefficient 1/Z p(x) = p~(x)/Z graph used illustrate the x x 1 x 2 3 • use sum-product to findfunnormalized marginals for xi orithm. f • coefficient 1/Z can be obtained by normalizing the marginal a b • f efficient as calculated only over one single variable c 8.4. Inference in Graphical Models 409 Figure 8.51 A simple factor graph used to illustrate the x1 x2 x3 sum-product algorithm. fa fb x4 fc nnormalized joint distribution is given by x4 p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) F 8.73 graph whose unnormalized joint distribution is given by Unnormalized joint distributions ply the sum-product algorithm to this graph, let us designate node x3 which case there are two leaf nodes fa1 1 , x2 )fb (x.2 ,Startingxwith the leaf (8.73) p(x) = x(xand x 4 x3 )fc (x2 , 4 ).
  • 9. 410 8.4.8. GRAPHICAL MODELS Inference in Graphical Models 409 p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) r graph used to illustrate the x1 x1 x2 x2 In order to3apply thex x3 x sum-product algorithm to this graph, let us x x2 designate node x3 1 3 algorithm. as the root, in which case there are two leaf nodes x1 and x4 . Starting with the leaf fa nodes, we then have the following sequence of six messages f b µx1 →fa (x1 ) = 1 (8.74) fc µfa →x2 (x2 ) = fa (x1 , x2 ) (8.75) x1 µx4 →fc (x4 ) = 1 (8.76) x4 µfc →x2 (x2 ) = fc (x2 , x4 ) (8.77) x4 x4 x4 µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 ) (8.78) unnormalized joint distribution is given by (a) µ (x ) = (b) , x )µ f (x . (8.79) fb →x3 3 b 2 3 x2 →fb p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) x2 Figure 8.52 Flow of messages for the sum-product algorithm applied to the example graph in Figure 8.51. (a) pply the sum-productleaf nodesto 1 and x4 towards theThe direction 3 . (b) From the messages istowards the leaf nodes. Once this mes- From the algorithm x this graph, let us designate node xx of flow of these root node illustrated in Figure 8.52. root node 3 n which case there are two leaf nodes x1 and x4 . Startingsage propagation is complete, we can then propagate messages from the root node with the leaf en have the following sequence of six messages out to the leaf nodes, and these are given by One message has now passed in each direction across each link, and we can now µx1 →fa (x1 ) = 1 evaluate the marginals. As a simplex3 →fb (x3 ) = verify that the marginal p(x2 ) is (8.80) (8.74) µ check, let us 1 µfa →x2 (x2 ) = fa (x1given by the correct expression. Using→x2 (x2 )and substitutingxfor the messages using (8.81) , x2 ) (8.75) µfb (8.63) = fb (x2 , 3 ) x1 the above results, we have x3 µx4 →fc (x4 ) = 1 (8.76) µx2 →fa (x2 ) = µfb →x2 (x2 )µfc →x2 (x2 ) (8.82) p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 ) µfc →x2 (x2 ) = fc (x2 , x4 ) (8.77) µfa →x1 (x1 ) = fa (x1 , x2 )µx2 →fa (x2 ) (8.83) x2 = fa (x1 , xµ) fb (x2 ,µ 3 ) fc (x2 , x4 ) x4 x x2 →fc (x2 ) = fa →x2 (x2 )µfb →x2 (x2 ) 2 (8.84) µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 ) (8.78) x1 x3 x4 µfb →x3 (x3 ) = fb (x2 , x3 )µx2 →fb . (8.79) µfc →x4 (x4 ) = fc (x2 , x4 )µx2 →fc (x2 ). (8.85) x2 = fa (x1 , x2 )fb (x2 , xx2)fc (x2 , x4 ) 3 x1 x2 x4 n of flow of these messages is illustrated in Figure 8.52. Once this mes- = ation is complete, we can then propagate messages from the root node p(x) (8.86) f nodes, and these are given by x1 x3 x4 µx3 →fb (x3 ) = 1 as required. (8.80)
  • 10. af nodes x1 and x4 towards the root node x3 . (b) From the root node towards the leaf nodes. One message has now passed in each direction across each link, and we can now evaluate the marginals. As a simple check, let us verify that the marginal p(x2 ) is •given by the correct expression. Usingcalculated Marginal p(x2) can be (8.63) and substituting for the messages using the above results, we have p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 ) = fa (x1 , x2 ) fb (x2 , x3 ) fc (x2 , x4 ) x1 x3 x4 = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ) x1 x2 x4 = p(x) (8.86) x1 x3 x4 as required. So far, we have assumed that all of the variables in the graph are hidden. In most practical applications, a subset of the variables will be observed, and we wish to cal- culate posterior distributions conditioned on these observations. Observed nodes are easily handled within the sum-product algorithm as follows. Suppose we partition x into hidden variables h and observed variables v, and that the observed value of v is denoted v. Then we simply multiply the joint distribution p(x) by i I(vi , vi ), references @n_shuyo product corresponds where I(v, v) = 1 if v = v and I(v, v) = 0 otherwise. This @sleepy_yoshi @nokuno to p(h, v = v) and hence is an unnormalized version of p(h|v = v). By run- ning the sum-product algorithm, we can efficiently calculate the posterior marginals p(hi |v = v) up to a normalization coefficient whose value can be found efficiently