SlideShare a Scribd company logo
1 of 31
Download to read offline
Note to other teachers and users of these slides.
                                                          Andrew would be delighted if you found this source
                                                          material useful in giving your own lectures. Feel free
                                                          to use these slides verbatim, or to modify them to fit
                                                          your own needs. PowerPoint originals are available. If
                                                          you make use of a significant portion of these slides in
                                                          your own lecture, please include this message, or the
                                                          following link to the source repository of Andrew’s
                                                          tutorials: http://www.cs.cmu.edu/~awm/tutorials .
                                                          Comments and corrections gratefully received.




                                    Gaussians
                                      Andrew W. Moore
                                           Professor
                                  School of Computer Science
                                  Carnegie Mellon University
                                        www.cs.cmu.edu/~awm
                                          awm@cs.cmu.edu
                                            412-268-7599



Copyright © Andrew W. Moore                                                                                    Slide 1




                              Gaussians in Data Mining
    •     Why we should care
    •     The entropy of a PDF
    •     Univariate Gaussians
    •     Multivariate Gaussians
    •     Bayes Rule and Gaussians
    •     Maximum Likelihood and MAP using
          Gaussians


Copyright © Andrew W. Moore                                                                                    Slide 2




                                                                                                                         1
Why we should care
    • Gaussians are as natural as Orange Juice
      and Sunshine
    • We need them to understand Bayes Optimal
      Classifiers
    • We need them to understand regression
    • We need them to understand neural nets
    • We need them to understand mixture
      models
    •…
                                (You get the idea)
Copyright © Andrew W. Moore                                            Slide 3




                                                   ⎧1
                 The “box”                                         w
                                                            | x |≤
                                                       if
                                                   ⎪
                                          p ( x) = ⎨ w             2
                 distribution                                      w
                                                   ⎪ 0 if   | x |>
                                                   ⎩               2




            1/w




                              -w/2    0        w/2




Copyright © Andrew W. Moore                                            Slide 4




                                                                                 2
⎧1
                 The “box”                                                         w
                                                                            | x |≤
                                                                  ⎪ w if           2
                                                         p ( x) = ⎨
                 distribution                                                      w
                                                                  ⎪ 0 if    | x |>
                                                                  ⎩                2




            1/w




                                 -w/2          0              w/2

                                                      w2
                              E[ X ] = 0   Var[ X ] =
                                                      12
Copyright © Andrew W. Moore                                                              Slide 5




                                  Entropy of a PDF
                                                     ∞

                                                     ∫ p( x) log p( x)dx
             Entropy of X = H [ X ] = −
                                                   x = −∞

                                                              Natural log (ln or loge)



            The larger the entropy of a distribution…
            …the harder it is to predict
            …the harder it is to compress it
            …the less spiky the distribution



Copyright © Andrew W. Moore                                                              Slide 6




                                                                                                   3
⎧1
                 The “box”                                                                  w
                                                                                     | x |≤
                                                                         ⎪ w if             2
                                                                p ( x) = ⎨
                 distribution                                                               w
                                                                         ⎪ 0 if      | x |>
                                                                         ⎩                  2




            1/w




                              -w/2                       0             w/2
                       ∞                          w/ 2                            w/ 2
                                                          1     1        1     1
                       ∫ p( x) log p( x)dx = −     ∫w / 2 w log w dx = − w log w x=−∫wdx = log w
 H[ X ] = −
                     x = −∞                      x=−                                  /2


Copyright © Andrew W. Moore                                                                     Slide 7




                                                                         ⎧1
         Unit variance                                                                      w
                                                                                     | x |≤
                                                                             if
                                                                         ⎪
                                                                p ( x) = ⎨ w                2
        box distribution                                                                    w
                                                                         ⎪ 0 if      | x |>
                                                                         ⎩                  2

                                                                           E[ X ] = 0
               1
                                                                                    w2
                                                                         Var[ X ] =
             23
                                                                                    12


                              −3                         0              3
   if w = 2 3 then Var[ X ] = 1 and H [ X ] = 1.242

Copyright © Andrew W. Moore                                                                     Slide 8




                                                                                                          4
The Hat                      ⎧ w− | x |
                                                ⎪          if |x| ≤ w
                                       p ( x) = ⎨ w 2
                 distribution                   ⎪0         if |x| > w
                                                ⎩


                                               E[ X ] = 0
                                                        w2
               1
                                             Var[ X ] =
               w
                                                         6


                              −w               w
                                   0




Copyright © Andrew W. Moore                                         Slide 9




     Unit variance hat                          ⎧ w− | x |
                                                ⎪          if |x| ≤ w
                                       p ( x) = ⎨ w 2
       distribution                             ⎪0         if |x| > w
                                                ⎩


                                               E[ X ] = 0
                                                        w2
               1
                                             Var[ X ] =
                6
                                                         6



                              −6   0            6
     if w = 6 then Var[ X ] = 1 and H [ X ] = 1.396

Copyright © Andrew W. Moore                                        Slide 10




                                                                              5
Dirac Delta
          The “2 spikes”                                                             δ ( x = −1) + δ ( x = 1)
                                                                        p ( x) =
           distribution                                                                         2


                                                                                              E[ X ] = 0
                                           1                          1
                                             δ ( x = −1)                δ ( x = 1)
             ∞                             2                          2
                                                                                            Var[ X ] = 1
             2




                                    -1                         0                   1
                                                        ∞

                                                        ∫ p( x) log p( x)dx = −∞
                                         H[ X ] = −
                                                      x = −∞


Copyright © Andrew W. Moore                                                                                 Slide 11




                      Entropies of unit-variance
                            distributions
                              Distribution                         Entropy

                              Box                                  1.242

                              Hat                                  1.396

                              2 spikes                             -infinity

                              ???                                  1.4189                Largest possible
                                                                                       entropy of any unit-
                                                                                       variance distribution



Copyright © Andrew W. Moore                                                                                 Slide 12




                                                                                                                       6
Unit variance                                                        ⎛ x2 ⎞
                                                                            1
                                                                               exp⎜ − ⎟
                                                                p ( x) =          ⎜ 2⎟
              Gaussian                                                      2π    ⎝    ⎠


                                                                               E[ X ] = 0

                                                                             Var[ X ] = 1




                                             ∞

                                             ∫ p( x) log p( x)dx = 1.4189
                              H[ X ] = −
                                           x = −∞


Copyright © Andrew W. Moore                                                                 Slide 13




                      General                                                  ⎛ ( x − μ )2 ⎞
                                                                        1
                                                                            exp⎜ −          ⎟
                                                            p ( x) =           ⎜    2σ 2 ⎟
                      Gaussian                                         2π σ    ⎝            ⎠

                                                    σ=15
                                                                               E[ X ] = μ

                                                                              Var[ X ] = σ 2


                                                    μ=100




Copyright © Andrew W. Moore                                                                 Slide 14




                                                                                                       7
General                     Also known
                                                                                    ⎛ ( x − μ )2 ⎞
                                                 as the normal
                                                                             1
                                                                                 exp⎜ −          ⎟
                                                                 p ( x) =
                                                  distribution
                                                                                    ⎜    2σ 2 ⎟
                      Gaussian
                                                     or Bell-
                                                                            2π σ    ⎝            ⎠
                                                 shaped curve



                                                     σ=15
                                                                                   E[ X ] = μ

                                                                                  Var[ X ] = σ 2


                                                    μ=100

      Shorthand: We say X ~ N(μ,σ2) to mean “X is distributed as a Gaussian
      with parameters μ and σ2”.
      In the above figure, X ~ N(100,152)


Copyright © Andrew W. Moore                                                                    Slide 15




                                        The Error Function
            Assume X ~ N(0,1)
            Define ERF(x) = P(X<x) = Cumulative Distribution of X

                                    x

                                    ∫ p( z )dz
         ERF ( x) =
                                  z = −∞

                                  ⎛ z2 ⎞
                              x
                1
                           ∫ ⎜ 2⎟
                               exp⎜ − ⎟dz
       =
                2π                ⎝    ⎠
                        z = −∞




Copyright © Andrew W. Moore                                                                    Slide 16




                                                                                                          8
Using The Error Function
            Assume X ~ N(μ,σ2)
                                      x−μ
            P(X<x| μ,σ2) =    ERF (         )
                                      σ2




Copyright © Andrew W. Moore                                   Slide 17




                   The Central Limit Theorem
    • If (X1,X2, … Xn) are i.i.d. continuous random
      variables
                                            1n
    • Then define z = f ( x1 , x2 ,...xn ) = ∑ xi
                                            n i =1

    • As n-->infinity, p(z)--->Gaussian with mean
      E[Xi] and variance Var[Xi]

                       Somewhat of a justification for assuming
                                   Gaussian noise is common
Copyright © Andrew W. Moore                                   Slide 18




                                                                         9
Other amazing facts about
                           Gaussians
    • Wouldn’t you like to know?




    • We will not examine them until we need to.




Copyright © Andrew W. Moore                                                                      Slide 19




                                  Bivariate Gaussians
                         ⎛X⎞
          Write r.v. X = ⎜ ⎟              Then define               X ~ N (μ , Σ ) to mean
                         ⎜Y ⎟
                         ⎝⎠

                                                              (                              )
                                          1
                                                          exp − 1 (x − μ)T Σ −1 (x − μ)
                          p ( x) =                1             2
                                     2π || Σ ||       2


        Where the Gaussian’s parameters are…

                                ⎛ μx ⎞     ⎛σ 2 x          σ xy ⎞
                              μ=⎜ ⎟      Σ=⎜                    ⎟
                                ⎜μ ⎟       ⎜σ              σ 2y ⎟
                                ⎝ y⎠       ⎝ xy                 ⎠

          Where we insist that Σ is symmetric non-negative definite




Copyright © Andrew W. Moore                                                                      Slide 20




                                                                                                            10
Bivariate Gaussians
                          ⎛X⎞
           Write r.v. X = ⎜ ⎟             Then define               X ~ N (μ , Σ ) to mean
                          ⎜Y ⎟
                          ⎝⎠

                                                              (                                      )
                                          1
                                                          exp − 1 (x − μ)T Σ −1 (x − μ)
                          p ( x) =                1             2
                                     2π || Σ ||       2


        Where the Gaussian’s parameters are…

                                ⎛ μx ⎞     ⎛σ 2 x          σ xy ⎞
                              μ=⎜ ⎟      Σ=⎜                    ⎟
                                ⎜μ ⎟       ⎜σ              σ 2y ⎟
                                ⎝ y⎠       ⎝ xy                 ⎠

           Where we insist that Σ is symmetric non-negative definite

           It turns out that E[X] = μ and Cov[X] = Σ. (Note that this is a
           resulting property of Gaussians, not a definition)*

                                                                   *This note rates 7.4 on the pedanticness scale


Copyright © Andrew W. Moore                                                                                         Slide 21




    Evaluating                                                                              (                            )
                                                                       1
                                                                                       exp − 1 (x − μ)T Σ −1 (x − μ)
                                                  p ( x) =
   p(x): Step 1
                                                                               1             2
                                                                  2π || Σ ||       2




      1.     Begin with vector x

                                                                                                x



                                                                                        μ




Copyright © Andrew W. Moore                                                                                         Slide 22




                                                                                                                               11
Evaluating                                                                    (                             )
                                                         1
                                                                         exp − 1 (x − μ)T Σ −1 (x − μ)
                                         p ( x) =
   p(x): Step 2
                                                                 1             2
                                                    2π || Σ ||       2




      1.     Begin with vector x
             Define δ = x - μ
      2.
                                                                                          x

                                                                              δ
                                                                          μ




Copyright © Andrew W. Moore                                                                               Slide 23




    Evaluating                                                                    (                             )
                                                         1
                                                                         exp − 1 (x − μ)T Σ −1 (x − μ)
                                         p ( x) =
   p(x): Step 3
                                                                 1             2
                                                    2π || Σ ||       2


                                                                                       Contours defined by
                                                                                      sqrt(δTΣ-1δ) = constant
      1.     Begin with vector x
             Define δ = x - μ
      2.
                                                                                          x
      3.     Count the number of contours
                                                                              δ
             crossed of the ellipsoids
             formed Σ-1                                                   μ
                                sqrt(δTΣ-1δ)
             D = this count =
             = Mahalonobis Distance
             between x and μ




Copyright © Andrew W. Moore                                                                               Slide 24




                                                                                                                     12
Evaluating                                                                            (                           )
                                                                      1
                                                                                      exp − 1 (x − μ)T Σ −1 (x − μ)
                                             p ( x) =
   p(x): Step 4
                                                                              1             2
                                                                 2π || Σ ||       2




      1.     Begin with vector x
             Define δ = x - μ
      2.
      3.     Count the number of contours




                                                        exp(-D 2/2)
             crossed of the ellipsoids
             formed Σ-1
             D = this count = sqrt(δTΣ-1δ)
             = Mahalonobis Distance
             between x and μ
      4.     Define w = exp(-D 2/2)
                                                                                        D2
                                                       x close to μ in squared Mahalonobis
                                                     space gets a large weight. Far away gets
                                                                   a tiny weight

Copyright © Andrew W. Moore                                                                                     Slide 25




    Evaluating                                                                            (                           )
                                                                      1
                                                                                      exp − 1 (x − μ)T Σ −1 (x − μ)
                                             p ( x) =
   p(x): Step 5
                                                                              1             2
                                                                 2π || Σ ||       2




      1.     Begin with vector x
             Define δ = x - μ
      2.
      3.     Count the number of contours
                                                        exp(-D 2/2)




             crossed of the ellipsoids
             formed Σ-1
             D = this count = sqrt(δTΣ-1δ)
             = Mahalonobis Distance
             between x and μ
      4.     Define w = exp(-D 2/2)
      5.                                                                                D2
                                    1
                                                     to ensure∫ p(x)dx = 1
           Multiply w by                     1
                                2π || Σ ||       2




Copyright © Andrew W. Moore                                                                                     Slide 26




                                                                                                                           13
Example




    Observe: Mean, Principal axes,
                                     Common convention: show contour
    implication of off-diagonal
                                        corresponding to 2 standard
    covariance term, max gradient
                                           deviations from mean
    zone of p(x)

Copyright © Andrew W. Moore                                            Slide 27




                               Example




Copyright © Andrew W. Moore                                            Slide 28




                                                                                  14
Example




       In this example, x and y are almost independent




Copyright © Andrew W. Moore                                       Slide 29




                                 Example




       In this example, x and “x+y” are clearly not independent




Copyright © Andrew W. Moore                                       Slide 30




                                                                             15
Example




       In this example, x and “20x+y” are clearly not independent




Copyright © Andrew W. Moore                                                                                     Slide 31




                              Multivariate Gaussians
                      ⎛ X1 ⎞
                      ⎜    ⎟
                      ⎜X ⎟
       Write r.v. X = ⎜ 2 ⎟                        Then define              X ~ N (μ , Σ ) to mean
                        M
                      ⎜    ⎟
                      ⎜X ⎟
                      ⎝ m⎠

                                                                        (                             )
                                              1
                                                                    exp − 1 (x − μ)T Σ −1 (x − μ)
                       p ( x) =           m                 1             2
                                  (2π )           || Σ ||
                                              2                 2



                                                                       ⎛ μ1 ⎞        ⎛ σ 21   σ 12 L σ 1m ⎞
                                                                                     ⎜                     ⎟
          Where the Gaussian’s                                         ⎜⎟
                                                                       ⎜μ ⎟          ⎜σ       σ 2 2 L σ 2m ⎟
          parameters have…                                           μ=⎜ 2⎟      Σ = ⎜ 12
                                                                                                         M⎟
                                                                         M
                                                                                     ⎜M        M     O      ⎟
                                                                       ⎜⎟
                                                                       ⎜μ ⎟          ⎜σ              L σ 2m ⎟
                                                                                              σ 2m
                                                                       ⎝ m⎠          ⎝ 1m                   ⎠
      Where we insist that Σ is symmetric non-negative definite
      Again, E[X] = μ and Cov[X] = Σ. (Note that this is a resulting property of Gaussians, not a definition)

Copyright © Andrew W. Moore                                                                                     Slide 32




                                                                                                                           16
General Gaussians
                                      ⎛ μ1 ⎞       ⎛ σ 21    σ 12 L σ 1m ⎞
                                                   ⎜                      ⎟
                                      ⎜⎟
                                      ⎜μ ⎟         ⎜σ        σ 2 2 L σ 2m ⎟
                                    μ=⎜ 2⎟     Σ = ⎜ 12
                                                                             M⎟
                                        M
                                                   ⎜M            M       O      ⎟
                                      ⎜⎟
                                      ⎜μ ⎟         ⎜σ                    L σ 2m ⎟
                                                             σ 2m
                                      ⎝ m⎠         ⎝ 1m                         ⎠




                                    x2




                                                        x1
Copyright © Andrew W. Moore                                                                    Slide 33




                              Axis-Aligned Gaussians
                                           ⎛ σ 21                                      0⎞
                                                                         L
                                                  0          0                 0
                                           ⎜                                               ⎟
                                ⎛ μ1 ⎞     ⎜ 0 σ 22                                    0⎟
                                                                         L
                                                             0                 0
                                ⎜⎟         ⎜0                                          0⎟
                                ⎜μ ⎟                         σ   2
                                                                         L
                                                  0                            0
                                         Σ=⎜                                               ⎟
                                                                     3
                              μ=⎜ 2⎟
                                           ⎜M                                           M⎟
                                  M               M              M       O      M
                                ⎜⎟         ⎜                                               ⎟
                                ⎜μ ⎟                                     L σ 2 m −1
                                ⎝ m⎠          0   0          0                         0⎟
                                           ⎜
                                           ⎜0                                         σ 2m ⎟
                                                                         L
                                           ⎝                                               ⎠
                                                  0          0                 0
     X i ⊥ X i for i ≠ j
                                    x2




                                                        x1
Copyright © Andrew W. Moore                                                                    Slide 34




                                                                                                          17
Spherical Gaussians
                                             ⎛σ 2                            0⎞
                                                                     L
                                                    0       0            0
                                             ⎜                                 ⎟
                                ⎛ μ1 ⎞              σ
                                             ⎜0                              0⎟
                                                        2
                                                                     L
                                                            0            0
                                ⎜⎟           ⎜0                              0⎟
                                ⎜μ ⎟                        σ   2
                                                                     L
                                                    0                    0
                                           Σ=⎜                                 ⎟
                              μ=⎜ 2⎟
                                             ⎜M                               M⎟
                                  M                 M       M        O   M
                                ⎜⎟           ⎜                                 ⎟
                                ⎜μ ⎟                                 L σ2
                                ⎝ m⎠         ⎜0     0       0                0⎟
                                             ⎜0                              σ2⎟
                                                                     L
                                             ⎝                                 ⎠
                                                    0       0            0
     X i ⊥ X i for i ≠ j
                                    x2




                                                    x1
Copyright © Andrew W. Moore                                                                        Slide 35




                              Degenerate Gaussians
                                      ⎛ μ1 ⎞
                                      ⎜⎟
                                      ⎜μ ⎟
                                    μ=⎜ 2⎟              || Σ ||= 0
                                        M
                                      ⎜⎟
                                      ⎜μ ⎟
                                      ⎝ m⎠


                                                                               What’s so wrong
                                                                                with clipping
                                                                               one’s toenails in
                                    x2                                             public?




                                                    x1
Copyright © Andrew W. Moore                                                                        Slide 36




                                                                                                              18
Where are we now?
    • We’ve seen the formulae for Gaussians
    • We have an intuition of how they behave
    • We have some experience of “reading” a
      Gaussian’s covariance matrix

    • Coming next:
                                Some useful tricks with Gaussians


Copyright © Andrew W. Moore                                               Slide 37




                              Subsets of variables
                                                        ⎛ X1 ⎞
                                                        ⎜           ⎟
                                                    U=⎜ M ⎟
                            ⎛ X1 ⎞
                            ⎜    ⎟                      ⎜X          ⎟
                                          ⎛U⎞
                            ⎜ X2 ⎟                      ⎝ m (u ) ⎠
                                          ⎜ ⎟ where
                  Write X = ⎜      as X = ⎜ ⎟
                              M⎟                       ⎛ X m ( u ) +1 ⎞
                                          ⎝V⎠
                            ⎜    ⎟                     ⎜              ⎟
                            ⎜X ⎟                    V =⎜ M ⎟
                            ⎝ m⎠
                                                       ⎜X             ⎟
                                                       ⎝              ⎠
                                                               m



                This will be our standard notation for breaking an m-
                dimensional distribution into subsets of variables



Copyright © Andrew W. Moore                                               Slide 38




                                                                                     19
Gaussian Marginals                         ⎛U⎞       Margin-
                                                  ⎜⎟                    U
                                                  ⎜V⎟
         are Gaussian
                                                             alize
                                                  ⎝⎠

                   ⎛ X1 ⎞
                                               ⎛ X1 ⎞
                   ⎜    ⎟                                 ⎛ X m ( u ) +1 ⎞
                                               ⎜        ⎟ ⎜              ⎟
                                 ⎛U⎞
                   ⎜ X2 ⎟
                          as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟
         Write X = ⎜             ⎜V⎟
                     M⎟          ⎝⎠            ⎜X       ⎟ ⎜X             ⎟
                   ⎜    ⎟                                 ⎝              ⎠
                   ⎜X ⎟                        ⎝ m (u ) ⎠         m
                   ⎝ m⎠
                   ⎛⎛μ ⎞ ⎛ Σ          Σ uv ⎞ ⎞
           ⎛U⎞
        IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu           ⎟⎟
           ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT       Σ vv ⎟ ⎟
           ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv            ⎠⎠

       THEN U is also distributed as a Gaussian

      U ~ N (μ u , Σ uu )
Copyright © Andrew W. Moore                                                   Slide 39




       Gaussian Marginals                         ⎛U⎞       Margin-
                                                  ⎜⎟                    U
                                                  ⎜V⎟
         are Gaussian
                                                             alize
                                                  ⎝⎠

                   ⎛ X1 ⎞
                                               ⎛ X1 ⎞
                   ⎜    ⎟                                 ⎛ X m ( u ) +1 ⎞
                                               ⎜        ⎟ ⎜              ⎟
                                 ⎛U⎞
                   ⎜ X2 ⎟
                          as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟
         Write X = ⎜             ⎜⎟
                     M⎟          ⎝V⎠           ⎜X       ⎟ ⎜X             ⎟
                   ⎜    ⎟                                 ⎝              ⎠
                   ⎜X ⎟                        ⎝ m (u ) ⎠         m
                   ⎝ m⎠
                   ⎛⎛μ ⎞ ⎛ Σ          Σ uv ⎞ ⎞
           ⎛U⎞
        IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu           ⎟⎟
           ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT       Σ vv ⎟ ⎟
           ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv            ⎠⎠             This fact is not
                                                        immediately obvious
       THEN U is also distributed as a Gaussian
                                                   Obvious, once we know
      U ~ N (μ u , Σ uu )                           it’s a Gaussian (why?)

Copyright © Andrew W. Moore                                                   Slide 40




                                                                                         20
Gaussian Marginals                              ⎛U⎞       Margin-
                                                       ⎜⎟                           U
                                                       ⎜V⎟
         are Gaussian
                                                                  alize
                                                       ⎝⎠

                   ⎛ X1 ⎞
                                               ⎛ X1 ⎞
                   ⎜    ⎟                                  ⎛ X m ( u ) +1 ⎞
                                               ⎜        ⎟  ⎜              ⎟
                                 ⎛U⎞
                   ⎜ X2 ⎟
                          as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟
         Write X = ⎜             ⎜⎟
                     M⎟          ⎝V⎠           ⎜X       ⎟  ⎜X             ⎟
                   ⎜    ⎟                                  ⎝              ⎠
                   ⎜X ⎟                        ⎝ m (u ) ⎠          m
                   ⎝ m⎠                       How would you prove
                                                                    this?
                   ⎛⎛μ ⎞ ⎛ Σ            Σ uv ⎞ ⎞
           ⎛U⎞
        IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu             ⎟⎟
           ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT         Σ vv ⎟ ⎟
           ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv              ⎠⎠
                                                                         p (u )
                                                                   ∫
                                                             =         p (u , v ) d v
       THEN U is also distributed as a Gaussian
                                                                   v
      U ~ N (μ u , Σ uu )                                    =         (snore...)

Copyright © Andrew W. Moore                                                              Slide 41




                                                                 Matrix A
         Linear Transforms                                                       AX
                                                                 Multiply
                                                       X
          remain Gaussian
       Assume X is an m-dimensional Gaussian r.v.

                                 X ~ N (μ , Σ )
                                                             p ≤ m ):
       Define Y to be a p-dimensional r. v. thusly (note

                                   Y = AX

        …where A is a p x m matrix. Then…

                                    (              )
                              Y ~ N Aμ , AΣ A T
                                                              Note: the “subset” result is
                                                              a special case of this result


Copyright © Andrew W. Moore                                                              Slide 42




                                                                                                    21
Adding samples of 2
       independent Gaussians                                       X                  X+Y
                                                                             +
             is Gaussian                                           Y

         if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y

                          then X + Y ~ N (μ x + μ y , Σ x + Σ y )

          Why doesn’t this hold if X and Y are dependent?
          Which of the below statements is true?
                         If X and Y are dependent, then X+Y is Gaussian but possibly
                         with some other covariance
                         If X and Y are dependent, then X+Y might be non-Gaussian


Copyright © Andrew W. Moore                                                             Slide 43




   Conditional of                                                  ⎛U⎞   Condition-
                                                                   ⎜⎟                 U|V
                                                                   ⎜V⎟
Gaussian is Gaussian
                                                                           alize
                                                                   ⎝⎠

                     ⎛⎛μ ⎞ ⎛ Σ                         Σ uv ⎞ ⎞
             ⎛U⎞
          IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                          ⎟⎟
             ⎜⎟      ⎜ ⎜ μ ⎟ ⎜ ΣT                      Σ vv ⎟ ⎟
             ⎝V⎠     ⎝ ⎝ v ⎠ ⎝ uv                           ⎠⎠

                              U | V ~ N (μ u |v , Σ u |v ) where
         THEN

                                            −
                        μ u|v = μ u + Σ T Σ vv1 ( V − μ v )
                                        uv


                                                    −
                              Σ u |v = Σ uu − Σ T Σ vv1 Σ uv
                                                uv




Copyright © Andrew W. Moore                                                             Slide 44




                                                                                                   22
⎛ ⎛ 2977 ⎞ ⎛ 849 2              − 967 ⎞ ⎞
              ⎛⎛μ ⎞ ⎛ Σ                    Σ uv ⎞ ⎞
      ⎛U⎞                                                 ⎛ w⎞
                                                       IF ⎜ ⎟ ~ N ⎜ ⎜                                    ⎟⎟
   IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                     ⎟⎟                         ⎟, ⎜
      ⎜V⎟                                                 ⎜ y⎟    ⎜ ⎜ 76 ⎟ ⎜ − 967
              ⎜ ⎜ μ ⎟ ⎜ ΣT                 Σ vv ⎟ ⎟                                               3.68 2 ⎟ ⎟
      ⎝⎠                                                  ⎝⎠      ⎝⎝       ⎠⎝
              ⎝ ⎝ v ⎠ ⎝ uv                      ⎠⎠                                                       ⎠⎠
                                                                     w | y ~ N (μ w| y , Σ w| y ) where
                   U | V ~ N (μ u |v , Σ u|v ) where   THEN
  THEN
                                                                           976 ( y − 76 )
                                                       μ w| y = 2977 −
                        −
   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                    uv
                                                                              3.68 2
                                                                          967 2
                        −
   Σ u|v = Σ uu − Σ T Σ vv1 Σ uv                                = 849 2 −         = 808 2
                                                       Σ w| y
                    uv
                                                                          3.68 2




Copyright © Andrew W. Moore                                                                                    Slide 45




                                                                  ⎛ ⎛ 2977 ⎞ ⎛ 849 2              − 967 ⎞ ⎞
              ⎛⎛μ ⎞ ⎛ Σ                    Σ uv ⎞ ⎞
      ⎛U⎞                                                 ⎛ w⎞
                                                       IF ⎜ ⎟ ~ N ⎜ ⎜                                    ⎟⎟
   IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                     ⎟⎟                         ⎟, ⎜
      ⎜V⎟                                                 ⎜ y⎟    ⎜ ⎜ 76 ⎟ ⎜ − 967
              ⎜ ⎜ μ ⎟ ⎜ ΣT                 Σ vv ⎟ ⎟                                               3.68 2 ⎟ ⎟
      ⎝⎠                                                  ⎝⎠      ⎝⎝       ⎠⎝
              ⎝ ⎝ v ⎠ ⎝ uv                      ⎠⎠                                                       ⎠⎠
                                                                     w | y ~ N (μ w| y , Σ w| y ) where
                   U | V ~ N (μ u |v , Σ u|v ) where   THEN
  THEN
                                                                           976 ( y − 76 )
                                                       μ w| y = 2977 −
                        −
   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                    uv
                                                                              3.68 2
                                                                          967 2
                        −
   Σ u|v = Σ uu − Σ T Σ vv1 Σ uv                                = 849 2 −         = 808 2
                                                       Σ w| y
                    uv                                                         2
                                                                          3.68


                                                                                    P(w|m=82)

                                                                                        P(w|m=76)




                                                                                        P(w)

Copyright © Andrew W. Moore                                                                                    Slide 46




                                                                                                                          23
⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞
              ⎛⎛μ ⎞ ⎛ Σ                    Σ uv ⎞ ⎞
      ⎛U⎞                                                       ⎛ w⎞
                                                         IF ⎜ ⎟ ~ N ⎜ ⎜                                ⎟⎟
   IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                     ⎟⎟                                  ⎟, ⎜
      ⎜V⎟                                                       ⎜ y⎟         ⎜      ⎟ ⎜−
              ⎜ ⎜ μ ⎟ ⎜ ΣT                 Σ vv ⎟ ⎟                                                   2⎟
                                                                ⎝ Note: ⎜ ⎝ 76 given 967 3.68 ⎠ ⎟
      ⎝⎠                                                            ⎠               ⎠ ⎝ value of
              ⎝ ⎝ v ⎠ ⎝ uv                      ⎠⎠                         when
                                                                           ⎝                             ⎠
                                                                      v isyμ~, N (μ conditional
                                                                                    w , Σ | y ) where
                                                                             v the
                   U | V ~ N (μ u |v , Σ u|v ) where     THEN w |
  THEN                                                                    mean of | y is wμu
                                                                                      u
                                                                             976 ( y − 76 )
                                                          μ w| y = 2977 −
                        −
   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                    uv
                                                                                 3.68 2
                                                                             967 2
                        −
   Σ u|v = Σ uu − Σ T Σ vv1 Σ uv                         Σ w| y = 849 2 −           = 808 2
                                                            Note: marginal 23.68 mean is
                    uv

                                                               a linear function of v


                                                                              P(w|m=82)
                                                         Note: conditional
                                                       variance can only be
                                                      equal to or smaller than P(w|m=76)
                                                         marginal variance
         Note: conditional
     variance is independent
      of the given value of v

                                                                                      P(w)

Copyright © Andrew W. Moore                                                                                  Slide 47




      Gaussians and the                                                                             ⎛U⎞
                                                                   U|V              Chain
                                                                                                    ⎜⎟
                                                                                                    ⎜V⎟
         chain rule
                                                                                    Rule
                                                                    V                               ⎝⎠

    Let A be a constant matrix

    IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv )
                        ⎛U⎞
                        ⎜ ⎟ ~ N (μ , Σ ), with
    THEN                ⎜V⎟
                        ⎝⎠
                                   ⎛ AΣ vv A T + Σ u |v             AΣ vv ⎞
       ⎛ Aμ v ⎞
                                 Σ=⎜                                      ⎟
     μ=⎜
       ⎜μ ⎟   ⎟                    ⎜ ( AΣ ) T                        Σ vv ⎟
       ⎝ v⎠                        ⎝                                      ⎠
                                              vv




Copyright © Andrew W. Moore                                                                                  Slide 48




                                                                                                                        24
Available Gaussian tools
⎛U⎞                                              ⎛⎛μ ⎞ ⎛ Σ        Σ uv ⎞ ⎞
                   Margin-               ⎛U⎞
                                                                             THEN U ~ N (μ u , Σ uu )
                                      IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu         ⎟⎟
⎜⎟                               U       ⎜⎟      ⎜ ⎜ μ ⎟ ⎜ ΣT     Σ vv ⎟ ⎟
⎜V⎟                                      ⎝V⎠
                    alize                        ⎝ ⎝ v ⎠ ⎝ uv          ⎠⎠
⎝⎠
                  Matrix A
                                       IF X ~ N (μ , Σ ) AND Y = AX THEN Y ~ N (Aμ , AΣ A T )
                                 AX
                   Multiply
X
                                           if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y
                                           then X + Y ~ N (μ x + μ y , Σ x + Σ y )
X
                        +        X+Y
Y                                                                            Σ uv ⎞ ⎞ THEN U | V ~ N (μ , Σ )
                                                    ⎛⎛μ ⎞ ⎛ Σ
                                            ⎛U⎞
                                         IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                 ⎟⎟
                                            ⎜V⎟     ⎜ ⎜ μ ⎟ ⎜ ΣT             Σ vv ⎟ ⎟
                                                                                                       u |v u |v

⎛U⎞                                         ⎝⎠      ⎝ ⎝ v ⎠ ⎝ uv                  ⎠⎠
                Condition-
⎜⎟                               U|V
⎜V⎟               alize                                                 −
                                                   μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
                                         where
⎝⎠                                                                                                              −
                                                                                          Σ u |v = Σ uu − Σ T Σ vv1 Σ uv
                                                                    uv
                                                                                                            uv


                                                 U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv )
                                     ⎛U⎞    IF
U|V                      Chain
                                     ⎜⎟
                                     ⎜V⎟
                         Rule                         ⎛U⎞                        ⎛ AΣ vv A T + Σ u|v        AΣ vv ⎞
 V                                   ⎝⎠               ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜                                ⎟
                                            THEN      ⎜V⎟                        ⎜ ( AΣ ) T                 Σ vv ⎟
                                                      ⎝⎠                         ⎝                                ⎠
                                                                                            vv




Copyright © Andrew W. Moore                                                                                        Slide 49




                                        Assume…
    • You are an intellectual snob
    • You have a child




Copyright © Andrew W. Moore                                                                                        Slide 50




                                                                                                                              25
Intellectual snobs with children
    • …are obsessed with IQ
    • In the world as a whole, IQs are drawn from
      a Gaussian N(100,152)




Copyright © Andrew W. Moore                            Slide 51




                                    IQ tests
    • If you take an IQ test you’ll get a score that,
      on average (over many tests) will be your
      IQ
    • But because of noise on any one test the
      score will often be a few points lower or
      higher than your true IQ.

                              SCORE | IQ ~ N(IQ,102)


Copyright © Andrew W. Moore                            Slide 52




                                                                  26
Assume…
    • You drag your kid off to get tested
    • She gets a score of 130
    • “Yippee” you screech and start deciding how
      to casually refer to her membership of the
      top 2% of IQs in your Christmas newsletter.

                                       P(X<130|μ=100,σ2=152) =
                                       P(X<2| μ=0,σ2=1) =
                                       erf(2) = 0.977




Copyright © Andrew W. Moore                                      Slide 53




                              Assume…
    • You drag your kid off to get tested
                               You are thinking:
    • She gets a score of 130
                      Well sure the test isn’t accurate, so
    • “Yippee” you screech andan IQ of 120 or she how
                      she might have start deciding
                       might have an 1Q of 140, but the
      to casually refer most her IQ given the evidence the
                         to likely membership of
      top 2% of IQs in “score=130” is, of course, newsletter.
                          your Christmas 130.

                                       P(X<130|μ=100,σ2=152) =
                                       P(X<2| μ=0,σ2=1) =
                                       erf(2) = 0.977

                                               Can we trust
                                              this reasoning?
Copyright © Andrew W. Moore                                      Slide 54




                                                                            27
Maximum Likelihood IQ
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

                              IQ mle = arg max p ( s = 130 | iq )
                                            iq


    • The MLE is the value of the hidden parameter that
      makes the observed data most likely
    • In this case
                                       IQ mle = 130

Copyright © Andrew W. Moore                                                                Slide 55




                                                  BUT….
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

                              IQ mle = arg max p ( s = 130 | iq )
                                            iq


    • The MLE is the value of the hidden parameter that
      makes the observed data most likely
    • In this case                 This is not the same as
                                                           “The most likely value of the
                                                          parameter given the observed
                                                  = 130
                                            mle
                                       IQ
                                                                      data”

Copyright © Andrew W. Moore                                                                Slide 56




                                                                                                      28
What we really want:
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

    • Question: What is
      IQ | (S=130)?



             Called the Posterior
              Distribution of IQ

Copyright © Andrew W. Moore                                            Slide 57




                              Which tool or tools?
                                          ⎛U⎞
    • IQ~N(100,152)                              Margin-
                                          ⎜⎟                 U
                                          ⎜V⎟     alize
                                          ⎝⎠
    • S|IQ ~ N(IQ, 102)
                                                 Matrix A
    • S=130
                                                             AX
                                                 Multiply
                                          X
    • Question: What is                   X
                                                   +         X+Y
      IQ | (S=130)?                       Y
                                          ⎛U⎞   Condition-
                                          ⎜⎟                 U|V
                                          ⎜V⎟     alize
                                          ⎝⎠

                                                                 ⎛U⎞
                                          U|V       Chain
                                                                 ⎜⎟
                                                                 ⎜V⎟
                                                    Rule
                                           V                     ⎝⎠
Copyright © Andrew W. Moore                                            Slide 58




                                                                                  29
Plan
    • IQ~N(100,152)
    • S|IQ ~ N(IQ, 102)
    • S=130

    • Question: What is
      IQ | (S=130)?

                                      ⎛S⎞                          ⎛ IQ ⎞
      S | IQ                  Chain                                              Condition-
                                      ⎜⎟                           ⎜⎟                                    IQ | S
                                                    Swap
                                      ⎜ IQ ⎟                       ⎜S⎟
                              Rule                                                 alize
       IQ                             ⎝⎠                           ⎝⎠


Copyright © Andrew W. Moore                                                                                        Slide 59




          Working…                                                     ⎛⎛μ ⎞ ⎛ Σ                Σ uv ⎞ ⎞ THEN
                                                               ⎛U⎞
                                                            IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu                 ⎟⎟
                                                                       ⎜ ⎜ μ ⎟ ⎜ ΣT
                                                               ⎜⎟
                                                                                                Σ vv ⎟ ⎟
                                                                 V⎠
                                                               ⎝       ⎝ ⎝ v ⎠ ⎝ uv                  ⎠⎠
   IQ~N(100,152)                                                                            −
                                                                       μ u |v = μ u + Σ T Σ vv1 ( V − μ v )
   S|IQ ~ N(IQ, 102)                                                                    uv


   S=130
                                               IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv )

   Question: What is IQ | (S=130)?                       ⎛U⎞                        ⎛ AΣ vv A T + Σ u |v      AΣ vv ⎞
                                                         ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜                               ⎟
                                               THEN      ⎜V⎟                        ⎜ ( AΣ ) T                Σ vv ⎟
                                                         ⎝⎠                         ⎝                               ⎠
                                                                                               vv




Copyright © Andrew W. Moore                                                                                        Slide 60




                                                                                                                              30
Your pride and joy’s posterior IQ
    • If you did the working, you now have
      p(IQ|S=130)
    • If you have to give the most likely IQ given
      the score you should give
                              IQ map = arg max p (iq | s = 130 )
                                          iq



    • where MAP means “Maximum A-posteriori”



Copyright © Andrew W. Moore                                        Slide 61




                              What you should know
    • The Gaussian PDF formula off by heart
    • Understand the workings of the formula for
      a Gaussian
    • Be able to understand the Gaussian tools
      described so far
    • Have a rough idea of how you could prove
      them
    • Be happy with how you could use them

Copyright © Andrew W. Moore                                        Slide 62




                                                                              31

More Related Content

Viewers also liked

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetMarcus Peterson
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networksguestfee8698
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)guestfee8698
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Modelsguestfee8698
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clusteringguestfee8698
 

Viewers also liked (12)

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
FY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact SheetFY2009 Sex Ed Abstinence Fact Sheet
FY2009 Sex Ed Abstinence Fact Sheet
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
VC dimensio
VC dimensioVC dimensio
VC dimensio
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Gaussians

  • 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Gaussians Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © Andrew W. Moore Slide 1 Gaussians in Data Mining • Why we should care • The entropy of a PDF • Univariate Gaussians • Multivariate Gaussians • Bayes Rule and Gaussians • Maximum Likelihood and MAP using Gaussians Copyright © Andrew W. Moore Slide 2 1
  • 2. Why we should care • Gaussians are as natural as Orange Juice and Sunshine • We need them to understand Bayes Optimal Classifiers • We need them to understand regression • We need them to understand neural nets • We need them to understand mixture models •… (You get the idea) Copyright © Andrew W. Moore Slide 3 ⎧1 The “box” w | x |≤ if ⎪ p ( x) = ⎨ w 2 distribution w ⎪ 0 if | x |> ⎩ 2 1/w -w/2 0 w/2 Copyright © Andrew W. Moore Slide 4 2
  • 3. ⎧1 The “box” w | x |≤ ⎪ w if 2 p ( x) = ⎨ distribution w ⎪ 0 if | x |> ⎩ 2 1/w -w/2 0 w/2 w2 E[ X ] = 0 Var[ X ] = 12 Copyright © Andrew W. Moore Slide 5 Entropy of a PDF ∞ ∫ p( x) log p( x)dx Entropy of X = H [ X ] = − x = −∞ Natural log (ln or loge) The larger the entropy of a distribution… …the harder it is to predict …the harder it is to compress it …the less spiky the distribution Copyright © Andrew W. Moore Slide 6 3
  • 4. ⎧1 The “box” w | x |≤ ⎪ w if 2 p ( x) = ⎨ distribution w ⎪ 0 if | x |> ⎩ 2 1/w -w/2 0 w/2 ∞ w/ 2 w/ 2 1 1 1 1 ∫ p( x) log p( x)dx = − ∫w / 2 w log w dx = − w log w x=−∫wdx = log w H[ X ] = − x = −∞ x=− /2 Copyright © Andrew W. Moore Slide 7 ⎧1 Unit variance w | x |≤ if ⎪ p ( x) = ⎨ w 2 box distribution w ⎪ 0 if | x |> ⎩ 2 E[ X ] = 0 1 w2 Var[ X ] = 23 12 −3 0 3 if w = 2 3 then Var[ X ] = 1 and H [ X ] = 1.242 Copyright © Andrew W. Moore Slide 8 4
  • 5. The Hat ⎧ w− | x | ⎪ if |x| ≤ w p ( x) = ⎨ w 2 distribution ⎪0 if |x| > w ⎩ E[ X ] = 0 w2 1 Var[ X ] = w 6 −w w 0 Copyright © Andrew W. Moore Slide 9 Unit variance hat ⎧ w− | x | ⎪ if |x| ≤ w p ( x) = ⎨ w 2 distribution ⎪0 if |x| > w ⎩ E[ X ] = 0 w2 1 Var[ X ] = 6 6 −6 0 6 if w = 6 then Var[ X ] = 1 and H [ X ] = 1.396 Copyright © Andrew W. Moore Slide 10 5
  • 6. Dirac Delta The “2 spikes” δ ( x = −1) + δ ( x = 1) p ( x) = distribution 2 E[ X ] = 0 1 1 δ ( x = −1) δ ( x = 1) ∞ 2 2 Var[ X ] = 1 2 -1 0 1 ∞ ∫ p( x) log p( x)dx = −∞ H[ X ] = − x = −∞ Copyright © Andrew W. Moore Slide 11 Entropies of unit-variance distributions Distribution Entropy Box 1.242 Hat 1.396 2 spikes -infinity ??? 1.4189 Largest possible entropy of any unit- variance distribution Copyright © Andrew W. Moore Slide 12 6
  • 7. Unit variance ⎛ x2 ⎞ 1 exp⎜ − ⎟ p ( x) = ⎜ 2⎟ Gaussian 2π ⎝ ⎠ E[ X ] = 0 Var[ X ] = 1 ∞ ∫ p( x) log p( x)dx = 1.4189 H[ X ] = − x = −∞ Copyright © Andrew W. Moore Slide 13 General ⎛ ( x − μ )2 ⎞ 1 exp⎜ − ⎟ p ( x) = ⎜ 2σ 2 ⎟ Gaussian 2π σ ⎝ ⎠ σ=15 E[ X ] = μ Var[ X ] = σ 2 μ=100 Copyright © Andrew W. Moore Slide 14 7
  • 8. General Also known ⎛ ( x − μ )2 ⎞ as the normal 1 exp⎜ − ⎟ p ( x) = distribution ⎜ 2σ 2 ⎟ Gaussian or Bell- 2π σ ⎝ ⎠ shaped curve σ=15 E[ X ] = μ Var[ X ] = σ 2 μ=100 Shorthand: We say X ~ N(μ,σ2) to mean “X is distributed as a Gaussian with parameters μ and σ2”. In the above figure, X ~ N(100,152) Copyright © Andrew W. Moore Slide 15 The Error Function Assume X ~ N(0,1) Define ERF(x) = P(X<x) = Cumulative Distribution of X x ∫ p( z )dz ERF ( x) = z = −∞ ⎛ z2 ⎞ x 1 ∫ ⎜ 2⎟ exp⎜ − ⎟dz = 2π ⎝ ⎠ z = −∞ Copyright © Andrew W. Moore Slide 16 8
  • 9. Using The Error Function Assume X ~ N(μ,σ2) x−μ P(X<x| μ,σ2) = ERF ( ) σ2 Copyright © Andrew W. Moore Slide 17 The Central Limit Theorem • If (X1,X2, … Xn) are i.i.d. continuous random variables 1n • Then define z = f ( x1 , x2 ,...xn ) = ∑ xi n i =1 • As n-->infinity, p(z)--->Gaussian with mean E[Xi] and variance Var[Xi] Somewhat of a justification for assuming Gaussian noise is common Copyright © Andrew W. Moore Slide 18 9
  • 10. Other amazing facts about Gaussians • Wouldn’t you like to know? • We will not examine them until we need to. Copyright © Andrew W. Moore Slide 19 Bivariate Gaussians ⎛X⎞ Write r.v. X = ⎜ ⎟ Then define X ~ N (μ , Σ ) to mean ⎜Y ⎟ ⎝⎠ ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = 1 2 2π || Σ || 2 Where the Gaussian’s parameters are… ⎛ μx ⎞ ⎛σ 2 x σ xy ⎞ μ=⎜ ⎟ Σ=⎜ ⎟ ⎜μ ⎟ ⎜σ σ 2y ⎟ ⎝ y⎠ ⎝ xy ⎠ Where we insist that Σ is symmetric non-negative definite Copyright © Andrew W. Moore Slide 20 10
  • 11. Bivariate Gaussians ⎛X⎞ Write r.v. X = ⎜ ⎟ Then define X ~ N (μ , Σ ) to mean ⎜Y ⎟ ⎝⎠ ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = 1 2 2π || Σ || 2 Where the Gaussian’s parameters are… ⎛ μx ⎞ ⎛σ 2 x σ xy ⎞ μ=⎜ ⎟ Σ=⎜ ⎟ ⎜μ ⎟ ⎜σ σ 2y ⎟ ⎝ y⎠ ⎝ xy ⎠ Where we insist that Σ is symmetric non-negative definite It turns out that E[X] = μ and Cov[X] = Σ. (Note that this is a resulting property of Gaussians, not a definition)* *This note rates 7.4 on the pedanticness scale Copyright © Andrew W. Moore Slide 21 Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 1 1 2 2π || Σ || 2 1. Begin with vector x x μ Copyright © Andrew W. Moore Slide 22 11
  • 12. Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 2 1 2 2π || Σ || 2 1. Begin with vector x Define δ = x - μ 2. x δ μ Copyright © Andrew W. Moore Slide 23 Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 3 1 2 2π || Σ || 2 Contours defined by sqrt(δTΣ-1δ) = constant 1. Begin with vector x Define δ = x - μ 2. x 3. Count the number of contours δ crossed of the ellipsoids formed Σ-1 μ sqrt(δTΣ-1δ) D = this count = = Mahalonobis Distance between x and μ Copyright © Andrew W. Moore Slide 24 12
  • 13. Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 4 1 2 2π || Σ || 2 1. Begin with vector x Define δ = x - μ 2. 3. Count the number of contours exp(-D 2/2) crossed of the ellipsoids formed Σ-1 D = this count = sqrt(δTΣ-1δ) = Mahalonobis Distance between x and μ 4. Define w = exp(-D 2/2) D2 x close to μ in squared Mahalonobis space gets a large weight. Far away gets a tiny weight Copyright © Andrew W. Moore Slide 25 Evaluating ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = p(x): Step 5 1 2 2π || Σ || 2 1. Begin with vector x Define δ = x - μ 2. 3. Count the number of contours exp(-D 2/2) crossed of the ellipsoids formed Σ-1 D = this count = sqrt(δTΣ-1δ) = Mahalonobis Distance between x and μ 4. Define w = exp(-D 2/2) 5. D2 1 to ensure∫ p(x)dx = 1 Multiply w by 1 2π || Σ || 2 Copyright © Andrew W. Moore Slide 26 13
  • 14. Example Observe: Mean, Principal axes, Common convention: show contour implication of off-diagonal corresponding to 2 standard covariance term, max gradient deviations from mean zone of p(x) Copyright © Andrew W. Moore Slide 27 Example Copyright © Andrew W. Moore Slide 28 14
  • 15. Example In this example, x and y are almost independent Copyright © Andrew W. Moore Slide 29 Example In this example, x and “x+y” are clearly not independent Copyright © Andrew W. Moore Slide 30 15
  • 16. Example In this example, x and “20x+y” are clearly not independent Copyright © Andrew W. Moore Slide 31 Multivariate Gaussians ⎛ X1 ⎞ ⎜ ⎟ ⎜X ⎟ Write r.v. X = ⎜ 2 ⎟ Then define X ~ N (μ , Σ ) to mean M ⎜ ⎟ ⎜X ⎟ ⎝ m⎠ ( ) 1 exp − 1 (x − μ)T Σ −1 (x − μ) p ( x) = m 1 2 (2π ) || Σ || 2 2 ⎛ μ1 ⎞ ⎛ σ 21 σ 12 L σ 1m ⎞ ⎜ ⎟ Where the Gaussian’s ⎜⎟ ⎜μ ⎟ ⎜σ σ 2 2 L σ 2m ⎟ parameters have… μ=⎜ 2⎟ Σ = ⎜ 12 M⎟ M ⎜M M O ⎟ ⎜⎟ ⎜μ ⎟ ⎜σ L σ 2m ⎟ σ 2m ⎝ m⎠ ⎝ 1m ⎠ Where we insist that Σ is symmetric non-negative definite Again, E[X] = μ and Cov[X] = Σ. (Note that this is a resulting property of Gaussians, not a definition) Copyright © Andrew W. Moore Slide 32 16
  • 17. General Gaussians ⎛ μ1 ⎞ ⎛ σ 21 σ 12 L σ 1m ⎞ ⎜ ⎟ ⎜⎟ ⎜μ ⎟ ⎜σ σ 2 2 L σ 2m ⎟ μ=⎜ 2⎟ Σ = ⎜ 12 M⎟ M ⎜M M O ⎟ ⎜⎟ ⎜μ ⎟ ⎜σ L σ 2m ⎟ σ 2m ⎝ m⎠ ⎝ 1m ⎠ x2 x1 Copyright © Andrew W. Moore Slide 33 Axis-Aligned Gaussians ⎛ σ 21 0⎞ L 0 0 0 ⎜ ⎟ ⎛ μ1 ⎞ ⎜ 0 σ 22 0⎟ L 0 0 ⎜⎟ ⎜0 0⎟ ⎜μ ⎟ σ 2 L 0 0 Σ=⎜ ⎟ 3 μ=⎜ 2⎟ ⎜M M⎟ M M M O M ⎜⎟ ⎜ ⎟ ⎜μ ⎟ L σ 2 m −1 ⎝ m⎠ 0 0 0 0⎟ ⎜ ⎜0 σ 2m ⎟ L ⎝ ⎠ 0 0 0 X i ⊥ X i for i ≠ j x2 x1 Copyright © Andrew W. Moore Slide 34 17
  • 18. Spherical Gaussians ⎛σ 2 0⎞ L 0 0 0 ⎜ ⎟ ⎛ μ1 ⎞ σ ⎜0 0⎟ 2 L 0 0 ⎜⎟ ⎜0 0⎟ ⎜μ ⎟ σ 2 L 0 0 Σ=⎜ ⎟ μ=⎜ 2⎟ ⎜M M⎟ M M M O M ⎜⎟ ⎜ ⎟ ⎜μ ⎟ L σ2 ⎝ m⎠ ⎜0 0 0 0⎟ ⎜0 σ2⎟ L ⎝ ⎠ 0 0 0 X i ⊥ X i for i ≠ j x2 x1 Copyright © Andrew W. Moore Slide 35 Degenerate Gaussians ⎛ μ1 ⎞ ⎜⎟ ⎜μ ⎟ μ=⎜ 2⎟ || Σ ||= 0 M ⎜⎟ ⎜μ ⎟ ⎝ m⎠ What’s so wrong with clipping one’s toenails in x2 public? x1 Copyright © Andrew W. Moore Slide 36 18
  • 19. Where are we now? • We’ve seen the formulae for Gaussians • We have an intuition of how they behave • We have some experience of “reading” a Gaussian’s covariance matrix • Coming next: Some useful tricks with Gaussians Copyright © Andrew W. Moore Slide 37 Subsets of variables ⎛ X1 ⎞ ⎜ ⎟ U=⎜ M ⎟ ⎛ X1 ⎞ ⎜ ⎟ ⎜X ⎟ ⎛U⎞ ⎜ X2 ⎟ ⎝ m (u ) ⎠ ⎜ ⎟ where Write X = ⎜ as X = ⎜ ⎟ M⎟ ⎛ X m ( u ) +1 ⎞ ⎝V⎠ ⎜ ⎟ ⎜ ⎟ ⎜X ⎟ V =⎜ M ⎟ ⎝ m⎠ ⎜X ⎟ ⎝ ⎠ m This will be our standard notation for breaking an m- dimensional distribution into subsets of variables Copyright © Andrew W. Moore Slide 38 19
  • 20. Gaussian Marginals ⎛U⎞ Margin- ⎜⎟ U ⎜V⎟ are Gaussian alize ⎝⎠ ⎛ X1 ⎞ ⎛ X1 ⎞ ⎜ ⎟ ⎛ X m ( u ) +1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎛U⎞ ⎜ X2 ⎟ as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟ Write X = ⎜ ⎜V⎟ M⎟ ⎝⎠ ⎜X ⎟ ⎜X ⎟ ⎜ ⎟ ⎝ ⎠ ⎜X ⎟ ⎝ m (u ) ⎠ m ⎝ m⎠ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ THEN U is also distributed as a Gaussian U ~ N (μ u , Σ uu ) Copyright © Andrew W. Moore Slide 39 Gaussian Marginals ⎛U⎞ Margin- ⎜⎟ U ⎜V⎟ are Gaussian alize ⎝⎠ ⎛ X1 ⎞ ⎛ X1 ⎞ ⎜ ⎟ ⎛ X m ( u ) +1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎛U⎞ ⎜ X2 ⎟ as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟ Write X = ⎜ ⎜⎟ M⎟ ⎝V⎠ ⎜X ⎟ ⎜X ⎟ ⎜ ⎟ ⎝ ⎠ ⎜X ⎟ ⎝ m (u ) ⎠ m ⎝ m⎠ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ This fact is not immediately obvious THEN U is also distributed as a Gaussian Obvious, once we know U ~ N (μ u , Σ uu ) it’s a Gaussian (why?) Copyright © Andrew W. Moore Slide 40 20
  • 21. Gaussian Marginals ⎛U⎞ Margin- ⎜⎟ U ⎜V⎟ are Gaussian alize ⎝⎠ ⎛ X1 ⎞ ⎛ X1 ⎞ ⎜ ⎟ ⎛ X m ( u ) +1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎛U⎞ ⎜ X2 ⎟ as X = ⎜ ⎟ where U = ⎜ M ⎟, V = ⎜ M ⎟ Write X = ⎜ ⎜⎟ M⎟ ⎝V⎠ ⎜X ⎟ ⎜X ⎟ ⎜ ⎟ ⎝ ⎠ ⎜X ⎟ ⎝ m (u ) ⎠ m ⎝ m⎠ How would you prove this? ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ p (u ) ∫ = p (u , v ) d v THEN U is also distributed as a Gaussian v U ~ N (μ u , Σ uu ) = (snore...) Copyright © Andrew W. Moore Slide 41 Matrix A Linear Transforms AX Multiply X remain Gaussian Assume X is an m-dimensional Gaussian r.v. X ~ N (μ , Σ ) p ≤ m ): Define Y to be a p-dimensional r. v. thusly (note Y = AX …where A is a p x m matrix. Then… ( ) Y ~ N Aμ , AΣ A T Note: the “subset” result is a special case of this result Copyright © Andrew W. Moore Slide 42 21
  • 22. Adding samples of 2 independent Gaussians X X+Y + is Gaussian Y if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y then X + Y ~ N (μ x + μ y , Σ x + Σ y ) Why doesn’t this hold if X and Y are dependent? Which of the below statements is true? If X and Y are dependent, then X+Y is Gaussian but possibly with some other covariance If X and Y are dependent, then X+Y might be non-Gaussian Copyright © Andrew W. Moore Slide 43 Conditional of ⎛U⎞ Condition- ⎜⎟ U|V ⎜V⎟ Gaussian is Gaussian alize ⎝⎠ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎝V⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ U | V ~ N (μ u |v , Σ u |v ) where THEN − μ u|v = μ u + Σ T Σ vv1 ( V − μ v ) uv − Σ u |v = Σ uu − Σ T Σ vv1 Σ uv uv Copyright © Andrew W. Moore Slide 44 22
  • 23. ⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ ⎛ w⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ ⎟⎟ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎟, ⎜ ⎜V⎟ ⎜ y⎟ ⎜ ⎜ 76 ⎟ ⎜ − 967 ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ 3.68 2 ⎟ ⎟ ⎝⎠ ⎝⎠ ⎝⎝ ⎠⎝ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ ⎠⎠ w | y ~ N (μ w| y , Σ w| y ) where U | V ~ N (μ u |v , Σ u|v ) where THEN THEN 976 ( y − 76 ) μ w| y = 2977 − − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) uv 3.68 2 967 2 − Σ u|v = Σ uu − Σ T Σ vv1 Σ uv = 849 2 − = 808 2 Σ w| y uv 3.68 2 Copyright © Andrew W. Moore Slide 45 ⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ ⎛ w⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ ⎟⎟ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎟, ⎜ ⎜V⎟ ⎜ y⎟ ⎜ ⎜ 76 ⎟ ⎜ − 967 ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ 3.68 2 ⎟ ⎟ ⎝⎠ ⎝⎠ ⎝⎝ ⎠⎝ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ ⎠⎠ w | y ~ N (μ w| y , Σ w| y ) where U | V ~ N (μ u |v , Σ u|v ) where THEN THEN 976 ( y − 76 ) μ w| y = 2977 − − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) uv 3.68 2 967 2 − Σ u|v = Σ uu − Σ T Σ vv1 Σ uv = 849 2 − = 808 2 Σ w| y uv 2 3.68 P(w|m=82) P(w|m=76) P(w) Copyright © Andrew W. Moore Slide 46 23
  • 24. ⎛ ⎛ 2977 ⎞ ⎛ 849 2 − 967 ⎞ ⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ ⎛U⎞ ⎛ w⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ ⎟⎟ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎟, ⎜ ⎜V⎟ ⎜ y⎟ ⎜ ⎟ ⎜− ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ 2⎟ ⎝ Note: ⎜ ⎝ 76 given 967 3.68 ⎠ ⎟ ⎝⎠ ⎠ ⎠ ⎝ value of ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ when ⎝ ⎠ v isyμ~, N (μ conditional w , Σ | y ) where v the U | V ~ N (μ u |v , Σ u|v ) where THEN w | THEN mean of | y is wμu u 976 ( y − 76 ) μ w| y = 2977 − − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) uv 3.68 2 967 2 − Σ u|v = Σ uu − Σ T Σ vv1 Σ uv Σ w| y = 849 2 − = 808 2 Note: marginal 23.68 mean is uv a linear function of v P(w|m=82) Note: conditional variance can only be equal to or smaller than P(w|m=76) marginal variance Note: conditional variance is independent of the given value of v P(w) Copyright © Andrew W. Moore Slide 47 Gaussians and the ⎛U⎞ U|V Chain ⎜⎟ ⎜V⎟ chain rule Rule V ⎝⎠ Let A be a constant matrix IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv ) ⎛U⎞ ⎜ ⎟ ~ N (μ , Σ ), with THEN ⎜V⎟ ⎝⎠ ⎛ AΣ vv A T + Σ u |v AΣ vv ⎞ ⎛ Aμ v ⎞ Σ=⎜ ⎟ μ=⎜ ⎜μ ⎟ ⎟ ⎜ ( AΣ ) T Σ vv ⎟ ⎝ v⎠ ⎝ ⎠ vv Copyright © Andrew W. Moore Slide 48 24
  • 25. Available Gaussian tools ⎛U⎞ ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ Margin- ⎛U⎞ THEN U ~ N (μ u , Σ uu ) IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜⎟ U ⎜⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ ⎜V⎟ ⎝V⎠ alize ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ ⎝⎠ Matrix A IF X ~ N (μ , Σ ) AND Y = AX THEN Y ~ N (Aμ , AΣ A T ) AX Multiply X if X ~ N (μ x , Σ x ) and Y ~ N (μ y , Σ y ) and X ⊥ Y then X + Y ~ N (μ x + μ y , Σ x + Σ y ) X + X+Y Y Σ uv ⎞ ⎞ THEN U | V ~ N (μ , Σ ) ⎛⎛μ ⎞ ⎛ Σ ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜V⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT Σ vv ⎟ ⎟ u |v u |v ⎛U⎞ ⎝⎠ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ Condition- ⎜⎟ U|V ⎜V⎟ alize − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) where ⎝⎠ − Σ u |v = Σ uu − Σ T Σ vv1 Σ uv uv uv U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv ) ⎛U⎞ IF U|V Chain ⎜⎟ ⎜V⎟ Rule ⎛U⎞ ⎛ AΣ vv A T + Σ u|v AΣ vv ⎞ V ⎝⎠ ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜ ⎟ THEN ⎜V⎟ ⎜ ( AΣ ) T Σ vv ⎟ ⎝⎠ ⎝ ⎠ vv Copyright © Andrew W. Moore Slide 49 Assume… • You are an intellectual snob • You have a child Copyright © Andrew W. Moore Slide 50 25
  • 26. Intellectual snobs with children • …are obsessed with IQ • In the world as a whole, IQs are drawn from a Gaussian N(100,152) Copyright © Andrew W. Moore Slide 51 IQ tests • If you take an IQ test you’ll get a score that, on average (over many tests) will be your IQ • But because of noise on any one test the score will often be a few points lower or higher than your true IQ. SCORE | IQ ~ N(IQ,102) Copyright © Andrew W. Moore Slide 52 26
  • 27. Assume… • You drag your kid off to get tested • She gets a score of 130 • “Yippee” you screech and start deciding how to casually refer to her membership of the top 2% of IQs in your Christmas newsletter. P(X<130|μ=100,σ2=152) = P(X<2| μ=0,σ2=1) = erf(2) = 0.977 Copyright © Andrew W. Moore Slide 53 Assume… • You drag your kid off to get tested You are thinking: • She gets a score of 130 Well sure the test isn’t accurate, so • “Yippee” you screech andan IQ of 120 or she how she might have start deciding might have an 1Q of 140, but the to casually refer most her IQ given the evidence the to likely membership of top 2% of IQs in “score=130” is, of course, newsletter. your Christmas 130. P(X<130|μ=100,σ2=152) = P(X<2| μ=0,σ2=1) = erf(2) = 0.977 Can we trust this reasoning? Copyright © Andrew W. Moore Slide 54 27
  • 28. Maximum Likelihood IQ • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 IQ mle = arg max p ( s = 130 | iq ) iq • The MLE is the value of the hidden parameter that makes the observed data most likely • In this case IQ mle = 130 Copyright © Andrew W. Moore Slide 55 BUT…. • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 IQ mle = arg max p ( s = 130 | iq ) iq • The MLE is the value of the hidden parameter that makes the observed data most likely • In this case This is not the same as “The most likely value of the parameter given the observed = 130 mle IQ data” Copyright © Andrew W. Moore Slide 56 28
  • 29. What we really want: • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 • Question: What is IQ | (S=130)? Called the Posterior Distribution of IQ Copyright © Andrew W. Moore Slide 57 Which tool or tools? ⎛U⎞ • IQ~N(100,152) Margin- ⎜⎟ U ⎜V⎟ alize ⎝⎠ • S|IQ ~ N(IQ, 102) Matrix A • S=130 AX Multiply X • Question: What is X + X+Y IQ | (S=130)? Y ⎛U⎞ Condition- ⎜⎟ U|V ⎜V⎟ alize ⎝⎠ ⎛U⎞ U|V Chain ⎜⎟ ⎜V⎟ Rule V ⎝⎠ Copyright © Andrew W. Moore Slide 58 29
  • 30. Plan • IQ~N(100,152) • S|IQ ~ N(IQ, 102) • S=130 • Question: What is IQ | (S=130)? ⎛S⎞ ⎛ IQ ⎞ S | IQ Chain Condition- ⎜⎟ ⎜⎟ IQ | S Swap ⎜ IQ ⎟ ⎜S⎟ Rule alize IQ ⎝⎠ ⎝⎠ Copyright © Andrew W. Moore Slide 59 Working… ⎛⎛μ ⎞ ⎛ Σ Σ uv ⎞ ⎞ THEN ⎛U⎞ IF ⎜ ⎟ ~ N ⎜ ⎜ u ⎟, ⎜ uu ⎟⎟ ⎜ ⎜ μ ⎟ ⎜ ΣT ⎜⎟ Σ vv ⎟ ⎟ V⎠ ⎝ ⎝ ⎝ v ⎠ ⎝ uv ⎠⎠ IQ~N(100,152) − μ u |v = μ u + Σ T Σ vv1 ( V − μ v ) S|IQ ~ N(IQ, 102) uv S=130 IF U | V ~ N (AV , Σ u |v ) and V ~ N (μ v , Σ vv ) Question: What is IQ | (S=130)? ⎛U⎞ ⎛ AΣ vv A T + Σ u |v AΣ vv ⎞ ⎜ ⎟ ~ N (μ , Σ ), with Σ = ⎜ ⎟ THEN ⎜V⎟ ⎜ ( AΣ ) T Σ vv ⎟ ⎝⎠ ⎝ ⎠ vv Copyright © Andrew W. Moore Slide 60 30
  • 31. Your pride and joy’s posterior IQ • If you did the working, you now have p(IQ|S=130) • If you have to give the most likely IQ given the score you should give IQ map = arg max p (iq | s = 130 ) iq • where MAP means “Maximum A-posteriori” Copyright © Andrew W. Moore Slide 61 What you should know • The Gaussian PDF formula off by heart • Understand the workings of the formula for a Gaussian • Be able to understand the Gaussian tools described so far • Have a rough idea of how you could prove them • Be happy with how you could use them Copyright © Andrew W. Moore Slide 62 31