SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Graphical Models                              Factor Graphs           Test-time Inference   Training




                       Part 2: Introduction to Graphical Models

                                 Sebastian Nowozin and Christoph H. Lampert



                                             Colorado Springs, 25th June 2011




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                   f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer              X                        Y
             presence and positions y of all objects                     f :X →Y
             shown




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                            f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer                       X                        Y
             presence and positions y of all objects                              f :X →Y
             shown




                                             X : image, Y: object annotations
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference           Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise
             language to define this mapping                              x
             Mapping can be ambiguous:
                                                                                   f (x)
             measurement noise, lack of                       X                            Y
             well-posedness (e.g. occlusions)                            f :X →Y
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise                                        ?
             language to define this mapping                              x
             Mapping can be ambiguous:                                             ?
             measurement noise, lack of                       X                        Y
             well-posedness (e.g. occlusions)                            p(Y |X = x)
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.
     Popular classes of graphical models,
         Undirected graphical models (Markov
         random fields),
         Directed graphical models (Bayesian
         networks),
             Factor graphs,
             Others: chain graphs, influence
             diagrams, etc.

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                      Test-time Inference   Training

Graphical Models



Example 1


                                                  Yi               Yj              Yk




                   Cliques C(G ): set of vertex sets V with V ⊆ V ,
                   E ∩ (V × V ) = V × V
                   Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}}

                                                         1
                                             p(y ) =       ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                                         Z



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                       Test-time Inference   Training

Graphical Models



Example 2


                                                        Yi                 Yj




                                                        Yk                 Yl



                   Here C(G ) = 2V : all subsets of V are cliques

                                                             1
                                                   p(y ) =                      ψA (yA ).
                                                             Z
                                                                 A∈2{i,j,k,l}


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs             Test-time Inference        Training

Factor Graphs



Why factor graphs?


                         Yi                  Yj              Yi   Yj         Yi              Yj




                         Yk                  Yl              Yk   Yl         Yk              Yl




                   Factor graphs are explicit about the factorization
                   Hence, easier to work with
                   Universal (just like MRFs and Bayesian networks)


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Factor Graphs



Capacity



           Yi                     Yj                                               Yi   Yj




           Yk                     Yl                                               Yk   Yl




                   Factor graph defines family of distributions
                   Some families are larger than others



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                           Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                   Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                    conditional
                treated as random variable                                          distribution




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                     Test-time Inference      Training

Factor Graphs



Energy Minimization

                                                                      1
                     argmax p(Y = y )                 =      argmax     exp(−               EF (yF ))
                        y ∈Y                                  y ∈Y    Z
                                                                                    F ∈F

                                                      =      argmax exp(−              EF (yF ))
                                                              y ∈Y
                                                                               F ∈F

                                                      =      argmax −           EF (yF )
                                                              y ∈Y
                                                                        F ∈F

                                                      =      argmin          EF (yF )
                                                              y ∈Y
                                                                      F ∈F
                                                      =      argmin E (y ).
                                                              y ∈Y


                   Energy minimization can be interpreted as solving for the most likely
                   state of some factor graph model
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs           Test-time Inference   Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w


                         distributions
                         indexed
                         by w                                      pw1
                                                             pw2




                                                                          distributions
                                                                          in family

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization



                Image segmentation model
                Pairwise “Potts” energy function
                EF (yi , yj ; w1 ),

                      EF : {0, 1} × {0, 1} × R → R,

                EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0            image segmentation model
                EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization (cont)



                Image segmentation model
                Unary energy function EF (yi ; x, w ),

                   EF : {0, 1} × X × R{0,1}×D → R,

                EF (0; x, w ) = w (0), ψF (x)
                EF (1; x, w ) = w (1), ψF (x)                  image segmentation model
                Features ψF : X → RD , e.g. image
                filters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs          Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                             ...               ...        ...
                                w(1), ψF (x)
                                                                                          ...
                                                              0    w1
                                                              w1   0




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs               Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                               ...                  ...        ...
                                w(1), ψF (x)
                                                                                               ...
                                                                 0     w1
                                                                 w1    0


                   Total number of parameters: D + D + 1
                   Parameters are shared, but energies differ because of different ψF (x)
                   General form, linear in w ,

                                              EF (yF ; xF , w ) = w (yF ), ψF (xF )

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Test-time Inference



Making Predictions




                   Making predictions: given x ∈ X , predict y ∈ Y
                   How to measure quality of prediction? (or function f : X → Y)




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                 Test-time Inference   Training

Test-time Inference



Loss function


                   Define a loss function

                                                             ∆ : Y × Y → R+ ,

                   so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗
                   is true.
                   The loss function is application dependent




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs      Test-time Inference   Training

Test-time Inference



Inference Task: Maximum A Posteriori (MAP) Inference




        Definition (Maximum A Posteriori (MAP) Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                             y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ).
                                           y ∈Y                y ∈Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                        Test-time Inference   Training

Test-time Inference



Inference Task: Probabilistic Inference


        Definition (Probabilistic Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                       log Z (x, w ) =              log             exp(−E (y ; x, w )),
                                                             y ∈Y
                               µF (yF )      = p(YF = yf |x, w ),                ∀F ∈ F, ∀yF ∈ YF .


                   This typically includes variable marginals

                                                             µi (yi ) = p(yi |x, w )



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Test-time Inference



Example: Man-made structure detection


                                                                       Xi
                                                                     ψi2
                                                                    Yi            3
                                                                                 ψi,k   Yk
                                                                           ψi1




                   Left: input image x,
                   Middle: ground truth labeling on 16-by-16 pixel blocks,
                   Right: factor graph model

                   Features: gradient and color histograms
                   Estimate model parameters from ≈ 60 training images


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference    Training

Test-time Inference



Example: Man-made structure detection




                   Left: input image x,
                   Middle (probabilistic inference): visualization of the variable
                   marginals p(yi = “manmade |x, w ),
                   Right (MAP inference): joint MAP labeling
                   y ∗ = argmaxy ∈Y p(y |x, w ).



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                Test-time Inference   Training

Training



Training: Overview



                   Assume a fully observed, independent and identically distributed
                   (iid) sample set

                                           {(x n , y n )}n=1,...,N ,   (x n , y n ) ∼ d(X , Y )

                   Goal: predict well,
                   Alternative goal: first model d(y |x) well by p(y |x, w ), then predict
                   by minimizing the expected loss




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models

Mais conteúdo relacionado

Mais procurados

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
Matthew Leingang
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite Integral
Matthew Leingang
 
Lecture11
Lecture11Lecture11
Lecture11
Bo Li
 

Mais procurados (19)

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
 
JavaYDL13
JavaYDL13JavaYDL13
JavaYDL13
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite Integral
 
Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
 
Lecture11
Lecture11Lecture11
Lecture11
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Image formation
Image formationImage formation
Image formation
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Kernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsKernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problems
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
 
Camera parameters
Camera parametersCamera parameters
Camera parameters
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of MaterialsBayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
 

Semelhante a 01 graphical models

Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
jasonj383
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
zukun
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
Robin Ryder
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
NBER
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculus
Springer
 
Tuto part2
Tuto part2Tuto part2
Tuto part2
Bo Li
 

Semelhante a 01 graphical models (20)

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
UCB 2012-02-28
UCB 2012-02-28UCB 2012-02-28
UCB 2012-02-28
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculus
 
Tuto part2
Tuto part2Tuto part2
Tuto part2
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
 

Mais de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 

Mais de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

01 graphical models

  • 1. Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 2. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 3. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotations Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 4. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to define this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 5. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to define this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 6. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 7. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 8. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 9. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 10. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 11. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 12. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 13. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l} Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 14. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 15. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 16. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Why factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 17. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Capacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph defines family of distributions Some families are larger than others Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 18. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 19. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 20. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 21. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 22. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 23. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 24. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 25. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 26. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Energy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph model Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 27. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 28. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in family Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 29. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 30. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image filters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 31. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 32. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies differ because of different ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 33. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Making Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 34. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Loss function Define a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependent Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 35. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 36. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 37. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 38. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 39. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 40. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 41. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 42. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 43. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 44. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Maximum A Posteriori (MAP) Inference Definition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 45. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Probabilistic Inference Definition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 46. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training images Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 47. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ). Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 48. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 49. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 50. Graphical Models Factor Graphs Test-time Inference Training Training Training: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: first model d(y |x) well by p(y |x, w ), then predict by minimizing the expected loss Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 51. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 52. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 53. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 54. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models