SlideShare uma empresa Scribd logo
1 de 27
On adaptation for the posterior distribution
       under local and sup-norm

Judith Rousseau, Marc Hoffman and Johannes Schmidt -
                       Hieber

     ENSAE - CREST et CEREMADE, Université Paris-Dauphine




                          January




                                                            1/ 26
Outline


   1   Bayesian nonparametric : posterior concentration
          Generalities
          Idea of the proof
          Examples of loss functions where things become less
         nice

   2   Bayesian Upper and Lower bounds
         Lower bound
         The case of ∞ and adaptation

   3   Links with confidence bands



                                                                2/ 26
Outline


   1   Bayesian nonparametric : posterior concentration
          Generalities
          Idea of the proof
          Examples of loss functions where things become less
         nice

   2   Bayesian Upper and Lower bounds
         Lower bound
         The case of ∞ and adaptation

   3   Links with confidence bands



                                                                3/ 26
Generalities
                  n       n
      Model : Y1 |θ ∼ pθ (density wrt µ), θ ∈ Θ
   A priori : θ ∼ Π : prior distribution
   −→ posterior distribution
                                n   n
                          dΠ(θ)pθ (Y1 )
           dΠ(θ|X n ) =          n      ,    n
                                            Y1 = (Y1 , . . . , Yn )
                            m(Y1 )

     Posterior concentration d(., .) = loss on Θ & θ0 ∈ Θ = True
                  n
    Eθ0 (Π [U n |Y1 ]) = 1 + o(1),   U n = {θ; d(θ, θ0 ) ≤     n}     n   ↓0

     Why should we care ?
   • Gives insight on some aspects of the prior
   • Gives some insight on inference : interpretation of posterior
   credible regions (loosely)
   • Helps understanding the links between freq. and Bayesian

                                                                               4/ 26
Minimax concentration rates on a Class Θα (L),

                                    c             n
                  sup        Eθ0 Π UM   n (α)
                                                |Y1   = o(1),
                θ0 ∈Θα (L)

where   n (α)   = minimax rate under d(., .) & over Θα (L).




                                                                5/ 26
Examples of Models-losses for which nice results exist
     Density estimation Yi ∼ pθ i.i.d.
                   √    √
   d(pθ , pθ )2 = ( pθ − pθ )2 (x)dx,            d(pθ , pθ ) =         |pθ −pθ |(x)dx

     Regression function
              Yi = f (xi ) + i ,    i   ∼ N (0, σ 2 ),     θ = (f , σ)

                                                     n
   d(pθ , pθ ) = f −f   2,    d(pθ , pθ ) = n−1           H 2 (pθ (y |Xi ), pθ (y |Xi ))
                                                    i=1
   H = Hellinger
    White noise
     dY (t) = f (t)dt + n−1/2 dW (t)        ⇔ Yi = θi + n−1/2 i ,           i ∈N

                             d(pθ , pθ ) = f − f     2

                                                                                           6/ 26
Examples : functional classes
   Θα (L) = Hölder (H(α, L))

             n (α)     = n−α/(2α+1)   minimax rate over H(α, L)


     Density example : Hellinger loss
   Prior = DPM

   f (x) = fP,σ (x) =      φσ (x−µ)dP(µ),     σ ∼ IΓ(a, b)     P ∼ DP(A, G0 )

                       c                           n
            sup Ef0 Π UM(n/ log n)−α/(2α+1) (f0 )|Y1          = o(1),
          f0 ∈Θα (L)

   U (f0 ) = {f , h(f0 , f ) ≤ } [ log n term necessary ? ]


   ⇒     Ef0 h(ˆ, f0 )2
               f              (n/ log n)−α/(2α+1) ,   ˆ(x) = E π [f (x)|Y n ]
                                                      f

                                                                                7/ 26
Outline


   1   Bayesian nonparametric : posterior concentration
          Generalities
          Idea of the proof
          Examples of loss functions where things become less
         nice

   2   Bayesian Upper and Lower bounds
         Lower bound
         The case of ∞ and adaptation

   3   Links with confidence bands



                                                                8/ 26
Outline of the proof : Tests and KL


                                                n   n
   Un = UM(n/ log n)−α/(2α+1) and ln (θ) = log pθ (Y1 )
   ¯n = (n/ log n)−α/(2α+1)

                                    c
                                   Un   eln (θ)−ln (θ0 ) dΠ(θ)        Nn
                    c   n
                Π [Un |Y1 ]   =                                  :=
                                   Θ    eln (θ)−ln (θ0 ) dΠ(θ)        Dn
             n
   φn = φn (Y1 ) ∈ [0, 1]
                                                                       2
          Eθ0 (Π [Un |Y1 ]) ≤ Eθ0 [φn ] + Pθ0 Dn < e−cn ¯
                   c   n       n                        n


                                                2
                                  + e(c+τ )n    n        Eθ [1 − φn ] dπ(θ)
                                                     c
                                                    Un




                                                                              9/ 26
Constraints


                                                                      2
    n
   Eθ0 [φn ] = o(1) &              sup         Eθ [1 − φn ] = o(e−cn ¯ ) → d(., .)
                                                                     n

                                d(θ,θ0 )>M ¯
                                           n


                                         2
                 Pθ0 Dn < e−cn ¯
                               n
                                               = o(1)    We need :


             Dn ≥         eln (θ)−ln (θ0 ) dΠ(θ)
                     Sn
                            2
                ≥ e−2n n Π Sn ∩ {ln (θ) − ln (θ0 ) > −2n ¯ 2 }
                                                         n

   Ok if Sn = {KL(pθ0 , pθ ) ≤ n ¯ 2 ; V (pθ0 , pθ ) ≤ n ¯ 2 } and
                   n     n
                                 n
                                           n     n
                                                         n

                                    2
             Π(Sn ) ≥ e−cn ¯ → links d(., .)
                           n
                                                           with KL(., .)


                                                                                     10/ 26
Outline


   1   Bayesian nonparametric : posterior concentration
          Generalities
          Idea of the proof
          Examples of loss functions where things become less
         nice

   2   Bayesian Upper and Lower bounds
         Lower bound
         The case of ∞ and adaptation

   3   Links with confidence bands



                                                                11/ 26
White noise model and pointwise or sup-norm loss
    White noise
  dY (t) = f (t)dt + n−1/2 dW (t)           ⇔ Yjk = θjk + n−1/2         jk ,   i ∈N
    pointwise loss : (f , f0 ) = (f (x0 ) − f0 (x0        ))2
    sup-norm loss : ∞ (f , f0 ) = supx |f (x) − f0 (x)|
    Random Truncation prior
                              J ∼ P,
                            θj,k ∼ g(.)     ∀k    ∀j ≤ J,
                            θj,k = 0   ∀k       ∀j > J
    L2 concentration
    sup         sup      Ef0 P π   f − f0   2   > M(n/ log n)−α/(2α+1) |Y         = o(1)
  α1 ≤α≤α2 f0 ∈H(α,L)

       concentration ∀α ∃ > 0
                                                      2 /(2α+1)2
    sup        Ef0 P π    (f , f0 ) > (n/ log n)−2α                |Y   = 1 + o(1)
  f0 ∈H(α,L)

                                                                                      12/ 26
If Deterministic Truncation prior

                 J := Jn (α) :           2Jn (α) = (n/ log n)1/(2α+1)
               θj,k ∼ g(.)      ∀k       ∀j ≤ Jn (α)
               θj,k = 0    ∀k      ∀j > Jn (α)

  L2 concentration

   sup        Ef0 P π     f − f0     2   > M(n/ log n)−α/(2α+1) |Y       = o(1)
 f0 ∈H(α,L)

     concentration ∀α ∃ > 0

     sup       Ef0 P π     (f , f0 ) > M(n/ log n)−α/(2α+1) |Y          = o(1)
  f0 ∈H(α,L)


• What does it mean ? Can we have adaptation with or                      ∞?



                                                                                  13/ 26
Why didn’t it work ?


   Same problem as in freq. (see M Low’s papers)
                            0                                                2
   f1 = 0,       f0 :      θj,0 =   n (α),    ∀2j ≤ L(n/ log n)2α/(2α+1) , j ≥ 1

   then
                                                                         2 /(2α+1)2
              (θj,k )2 ≤
                0                2
                            n (α) ,          2j/2 θj,0
                                                   0
                                                         (n/ log n)−2α
      j≥1,k                            j≥1

   and
                              P π [J = 0|Y ] = 1 + oP0 (1)
   f0 looks too much like f1 = 0




                                                                                      14/ 26
Gine & Nickl : posterior concentration rates for                          ∞   via
tests




     At best they have

     sup        Ef0 P π   ∞ (f , f0 )   > M(n/ log n)−(α−1/2)/(2α+1) |Y   = o(1)
   f0 ∈H(α,L)

   Proof based on tests → suboptimal. Can we do better ?




                                                                                    15/ 26
Outline


   1   Bayesian nonparametric : posterior concentration
          Generalities
          Idea of the proof
          Examples of loss functions where things become less
         nice

   2   Bayesian Upper and Lower bounds
         Lower bound
         The case of ∞ and adaptation

   3   Links with confidence bands



                                                                16/ 26
Bayesian Lower bounds : white noise model


  Let d(θ, θ ) be a symetrical semi-metric, e.g. d(θ, θ ) = (fθ , fθ )
  or ∞ (fθ , fθ ).
    Dual of modulus of continuity

                 φ(θ, ) = inf { θ − θ    2 |d(θ, θ   )> }
                           θ ∈Θ


                           φ( ) = inf φ(θ, )
                                    θ

  Theorem Let C > 0 and n such that
  Qn (C) := {θ; φ(θ, 2 n ) ≤ Cφ( n )} = ∅. Then ∀θ0 ∈ Qn (C), ∀Π,
  ∃K > 0
                                                            2
               Eθ0 ,n [P π [d(θ, θ0 ) ≥ n |Y ]] ≥ e−Knφ( n )



                                                                         17/ 26
Consequences

     Consequence 1 : We√   obtain as in Cai and Low
      n = inf{ , φ( ) > M/ n} then ∀un = o( n )

                 Eθ0 ,n [P π [d(θ, θ0 ) ≥ un |Y ]] = o(1)




                                                            18/ 26
Consequences

     Consequence 1 : We√   obtain as in Cai and Low
      n = inf{ , φ( ) > M/ n} then ∀un = o( n )

                  Eθ0 ,n [P π [d(θ, θ0 ) ≥ un |Y ]] = o(1)

     Consequence 2 : If φ( n ) = o( 2 )
                                    n

                                       2             2
                         e−Knφ(   n)
                                           >> e−Kn   n



     → Proof based on tests will lead to suboptimal
     concentration rates.

                        ¯n = inf{ ; φ( ) > M n }




                                                             18/ 26
Outline


   1   Bayesian nonparametric : posterior concentration
          Generalities
          Idea of the proof
          Examples of loss functions where things become less
         nice

   2   Bayesian Upper and Lower bounds
         Lower bound
         The case of ∞ and adaptation

   3   Links with confidence bands



                                                                19/ 26
The case of       ∞


            Yj,k = θj,k + n−1/2            j,k ,        j,k   ∼ N (0, 1) i.i.d

                   ∞ (fθ , fθ   )=           max 2j/2 |θj,k − θj,k |
                                                   k
                                       j

                      log n
  φ( n (β)) = O       √           ,        n (β)       = (n/ log n)−β/(2β/1) ,   Θ = H(β, L)
                        n
  Theorem
  There is a prior Π s.t. ∀C < 1/2

      sup      sup        Eθ0 (P π [   ∞ (θ0 , θ)         > M n (β)|Y ]) ≤ e−C log n
    β1 ≤β≤β2 θ0 ∈H(β,L)


                          n (β)   := (n/ log n)−β/(2β+1)
    Sieve prior (discrete prior)
    Spike and slab
                                                                                        20/ 26
spike and slab prior



   ∀j ≤ Jn with 2Jn ≈ n , ∀k

                                    1            1
                     θj,k ∼    1−       δ(0) +     g(.)
                                    n            n

   with log g smooth (Laplace, Gaussian, Student)
   Adaptive posterior concentration in L2 (loosing a log n) and   ∞

                          (n/ log n)−α/(2α+1)




                                                                      21/ 26
Some connections with confidence sets

    Adaptive confidence sets Cn

                       inf Pθ (θ ∈ Cn ) ≥ 1 − α
                        θ

  and
                                        −1
               sup      sup        n (β) Eθ0   [|Cn |] < +∞
             β1 ≤β≤β2 θ0 ∈H(β,L)

  with |Cn | = supθ,θ ∈Cn d(θ, θ )
    If d(., .) = ∞ Does not exist (M. Low)
    Hoffman and Nickl H(β1 , L) ∪ H(β2 , L) with β2 > β1
    ˜
    Θn = H(β2 , L) ∪ {θ ∈ H(β1 , L);    ∞ (θ, H(β2 , L))   > M n (β1 )}
                                 ˜
  Then Adaptive confidence set in Θn


                                                                          22/ 26
1rst Bayesian perspective
   H(β1 , L) ∪ H(β2 , L) with β2 > β1 If

       sup         sup        Eθ0 (P π [    ∞ (θ0 , θ)   > M n (β)|Y ]) ≤ e−C log n
     β∈{β1 ,β2 } θ0 ∈H(β,L)


   Set Cn = {θ0 ; P π [       ∞ (θ0 , θ)   > M n (β, θ0 )|Y ] < n−C /α} Then

      Pθ0 [θ0 ∈ Cn ] ≤ αnC Eθ0 [P π [
                 c
                                                ∞ (θ0 , θ)   > M n (β, θ0 )|Y ]] ≤ α

     problem : Control of Eθ [|Cn |]

                        sup        Eθ [|Cn |]      n (β1 )   →     OK
                     θ∈H(β1 ,L)

                       sup        Eθ [|Cn |]      n (β1 )    →    BAD
                    θ∈H(β2 ,L)

          ˜
   But on Θ := H(β2 , L) ∪ {θ ∈ H(β1 , L);                ∞ (θ, H(β2 , L))   >   n (β1 )}
   ˜          ˜
   Cn = Cn ∩ Θ Adaptive confidence set
                                                                                            23/ 26
A better ( ?) Bayesian perspective : back to basics

  If

         sup        sup         Eθ0 (P π [   ∞ (θ0 , θ)   > M n (β)|Y ]) ≤ e−C log n
       β∈[β1 ,β2 ] θ0 ∈H(β,L)

  and
                                                          ˆ           −1
                  sup           sup     Eθ0     ∞ (θ0 , θ)    n (β)        < +∞
               β∈[β1 ,β2 ] θ0 ∈H(β,L)

          Cn = {θ;      ∞ (θ, θ)
                                 ˆ ≤ kn (αn )},       P π [θ ∈ Cn |Y ] ≥ 1 − αn
  Then

                                Pθ [θ ∈ Cn ] dπ(θ) ≥ 1 − αn
                           Θ
                                   sup Eθ [|Cn |] ≤ 2M n (β),
                                 θ∈H(β,L)

  If Θ is bounded.
                                                                                       24/ 26
Conclusion


  • Bayesian is great for risks that are related to Kullback : L2 in
  regression, hellinger or L1 in density etc.
  • How to understand some specific features in these big
  models ?
                              More tricky
  • Good nonparametric priors : Have good properties for a wide
  range of loss functions
  • Why should we care ? → interpretation of credible bands ! ?
  • Extension to other models than white noise . [Done]
  • Can we go further than 2nd Bayesian interpretation
  (confidence sets) ?




                                                                       25/ 26
THANK YOU




            26/ 26

Mais conteúdo relacionado

Mais procurados

Moment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic ModelsMoment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic ModelsColin Gillespie
 
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climateMartin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climateJiří Šmída
 
some thoughts on divergent series
some thoughts on divergent seriessome thoughts on divergent series
some thoughts on divergent seriesgenius98
 
11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...Alexander Decker
 
Solution of linear and nonlinear partial differential equations using mixture...
Solution of linear and nonlinear partial differential equations using mixture...Solution of linear and nonlinear partial differential equations using mixture...
Solution of linear and nonlinear partial differential equations using mixture...Alexander Decker
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsArvind Devaraj
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas EberleBigMC
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsGabriel Peyré
 
SLC 2015 talk improved version
SLC 2015 talk improved versionSLC 2015 talk improved version
SLC 2015 talk improved versionZheng Mengdi
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave functionjiang-min zhang
 
11.[95 103]solution of telegraph equation by modified of double sumudu transf...
11.[95 103]solution of telegraph equation by modified of double sumudu transf...11.[95 103]solution of telegraph equation by modified of double sumudu transf...
11.[95 103]solution of telegraph equation by modified of double sumudu transf...Alexander Decker
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationGabriel Peyré
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursGabriel Peyré
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
 

Mais procurados (20)

Moment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic ModelsMoment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic Models
 
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climateMartin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
 
some thoughts on divergent series
some thoughts on divergent seriessome thoughts on divergent series
some thoughts on divergent series
 
11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...
 
Solution of linear and nonlinear partial differential equations using mixture...
Solution of linear and nonlinear partial differential equations using mixture...Solution of linear and nonlinear partial differential equations using mixture...
Solution of linear and nonlinear partial differential equations using mixture...
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Dsp3
Dsp3Dsp3
Dsp3
 
L02 acous
L02 acousL02 acous
L02 acous
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : Geodesics
 
SLC 2015 talk improved version
SLC 2015 talk improved versionSLC 2015 talk improved version
SLC 2015 talk improved version
 
Tables
TablesTables
Tables
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
 
11.[95 103]solution of telegraph equation by modified of double sumudu transf...
11.[95 103]solution of telegraph equation by modified of double sumudu transf...11.[95 103]solution of telegraph equation by modified of double sumudu transf...
11.[95 103]solution of telegraph equation by modified of double sumudu transf...
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems Regularization
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active Contours
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Lecture 9
Lecture 9Lecture 9
Lecture 9
 

Destaque

Cafeteria compen
Cafeteria compenCafeteria compen
Cafeteria compenAjay Pandey
 
Masters Thesis - Smart Cafeteria
Masters Thesis - Smart CafeteriaMasters Thesis - Smart Cafeteria
Masters Thesis - Smart CafeteriaRichard Philip
 
Egg stop case study in Hyderabad
Egg stop case study in HyderabadEgg stop case study in Hyderabad
Egg stop case study in HyderabadSneha Manjunath
 
Dimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion PlanningDimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion Planningepmvirtual.com
 
Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ
Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН  ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ  Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН  ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ
Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ United Brothers Group
 
cafeteria info management system
cafeteria info management systemcafeteria info management system
cafeteria info management systemGaurav Subham
 
Critical Analysis of the Planning scenario in Hyderabad City
Critical Analysis of the Planning scenario in Hyderabad CityCritical Analysis of the Planning scenario in Hyderabad City
Critical Analysis of the Planning scenario in Hyderabad CityMaitreyi Yellapragada
 
Cafeteria management system in sanothimi campus(cms) suresh
Cafeteria management system in sanothimi campus(cms) sureshCafeteria management system in sanothimi campus(cms) suresh
Cafeteria management system in sanothimi campus(cms) sureshNawaraj Ghimire
 
Difference between code, standard & Specification
Difference between code, standard & SpecificationDifference between code, standard & Specification
Difference between code, standard & SpecificationVarun Patel
 
Model cafeteria iit roorkee
Model cafeteria   iit roorkeeModel cafeteria   iit roorkee
Model cafeteria iit roorkeeConsultonmic
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 

Destaque (15)

1.20 feb
1.20 feb1.20 feb
1.20 feb
 
Cafeteria compen
Cafeteria compenCafeteria compen
Cafeteria compen
 
Masters Thesis - Smart Cafeteria
Masters Thesis - Smart CafeteriaMasters Thesis - Smart Cafeteria
Masters Thesis - Smart Cafeteria
 
Egg stop case study in Hyderabad
Egg stop case study in HyderabadEgg stop case study in Hyderabad
Egg stop case study in Hyderabad
 
Dimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion PlanningDimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion Planning
 
Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ
Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН  ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ  Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН  ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ
Penta Castle - 350 ХҮНД ҮЙЛЧЛЭХ УУЛ УУРХАЙН ХООЛНЫ ГАЗРЫН ЗОХИОН БАЙГУУЛАЛТ
 
cafeteria info management system
cafeteria info management systemcafeteria info management system
cafeteria info management system
 
Critical Analysis of the Planning scenario in Hyderabad City
Critical Analysis of the Planning scenario in Hyderabad CityCritical Analysis of the Planning scenario in Hyderabad City
Critical Analysis of the Planning scenario in Hyderabad City
 
Cafeteria management system in sanothimi campus(cms) suresh
Cafeteria management system in sanothimi campus(cms) sureshCafeteria management system in sanothimi campus(cms) suresh
Cafeteria management system in sanothimi campus(cms) suresh
 
Neufert 3rd edition - Architect's Data
Neufert 3rd edition - Architect's DataNeufert 3rd edition - Architect's Data
Neufert 3rd edition - Architect's Data
 
Difference between code, standard & Specification
Difference between code, standard & SpecificationDifference between code, standard & Specification
Difference between code, standard & Specification
 
Model cafeteria iit roorkee
Model cafeteria   iit roorkeeModel cafeteria   iit roorkee
Model cafeteria iit roorkee
 
Cafeteria
CafeteriaCafeteria
Cafeteria
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Engineering Drawing
Engineering DrawingEngineering Drawing
Engineering Drawing
 

Semelhante a Posterior Concentration under Local and Sup-Norm Losses

Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605ketanaka
 
Balanced homodyne detection
Balanced homodyne detectionBalanced homodyne detection
Balanced homodyne detectionwtyru1989
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problemsDelta Pi Systems
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1
 
Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.SEENET-MTP
 
kinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch statekinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch statejiang-min zhang
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsMercier Jean-Marc
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Nikita V. Artamonov
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrixRumah Belajar
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Sean Meyn
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)Zheng Mengdi
 
cmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfcmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfjyjyzr69t7
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderJuliho Castillo
 
Natalini nse slide_giu2013
Natalini nse slide_giu2013Natalini nse slide_giu2013
Natalini nse slide_giu2013Madd Maths
 

Semelhante a Posterior Concentration under Local and Sup-Norm Losses (20)

Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
Balanced homodyne detection
Balanced homodyne detectionBalanced homodyne detection
Balanced homodyne detection
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problems
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.Cosmin Crucean: Perturbative QED on de Sitter Universe.
Cosmin Crucean: Perturbative QED on de Sitter Universe.
 
Chapter3 laplace
Chapter3 laplaceChapter3 laplace
Chapter3 laplace
 
kinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch statekinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch state
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrix
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
 
cmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfcmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdf
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first order
 
Natalini nse slide_giu2013
Natalini nse slide_giu2013Natalini nse slide_giu2013
Natalini nse slide_giu2013
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 

Mais de eric_gautier (8)

Rouviere
RouviereRouviere
Rouviere
 
Favre
FavreFavre
Favre
 
Grelaud
GrelaudGrelaud
Grelaud
 
Ryder
RyderRyder
Ryder
 
Bertail
BertailBertail
Bertail
 
Lesage
LesageLesage
Lesage
 
Davezies
DaveziesDavezies
Davezies
 
Collier
CollierCollier
Collier
 

Posterior Concentration under Local and Sup-Norm Losses

  • 1. On adaptation for the posterior distribution under local and sup-norm Judith Rousseau, Marc Hoffman and Johannes Schmidt - Hieber ENSAE - CREST et CEREMADE, Université Paris-Dauphine January 1/ 26
  • 2. Outline 1 Bayesian nonparametric : posterior concentration Generalities Idea of the proof Examples of loss functions where things become less nice 2 Bayesian Upper and Lower bounds Lower bound The case of ∞ and adaptation 3 Links with confidence bands 2/ 26
  • 3. Outline 1 Bayesian nonparametric : posterior concentration Generalities Idea of the proof Examples of loss functions where things become less nice 2 Bayesian Upper and Lower bounds Lower bound The case of ∞ and adaptation 3 Links with confidence bands 3/ 26
  • 4. Generalities n n Model : Y1 |θ ∼ pθ (density wrt µ), θ ∈ Θ A priori : θ ∼ Π : prior distribution −→ posterior distribution n n dΠ(θ)pθ (Y1 ) dΠ(θ|X n ) = n , n Y1 = (Y1 , . . . , Yn ) m(Y1 ) Posterior concentration d(., .) = loss on Θ & θ0 ∈ Θ = True n Eθ0 (Π [U n |Y1 ]) = 1 + o(1), U n = {θ; d(θ, θ0 ) ≤ n} n ↓0 Why should we care ? • Gives insight on some aspects of the prior • Gives some insight on inference : interpretation of posterior credible regions (loosely) • Helps understanding the links between freq. and Bayesian 4/ 26
  • 5. Minimax concentration rates on a Class Θα (L), c n sup Eθ0 Π UM n (α) |Y1 = o(1), θ0 ∈Θα (L) where n (α) = minimax rate under d(., .) & over Θα (L). 5/ 26
  • 6. Examples of Models-losses for which nice results exist Density estimation Yi ∼ pθ i.i.d. √ √ d(pθ , pθ )2 = ( pθ − pθ )2 (x)dx, d(pθ , pθ ) = |pθ −pθ |(x)dx Regression function Yi = f (xi ) + i , i ∼ N (0, σ 2 ), θ = (f , σ) n d(pθ , pθ ) = f −f 2, d(pθ , pθ ) = n−1 H 2 (pθ (y |Xi ), pθ (y |Xi )) i=1 H = Hellinger White noise dY (t) = f (t)dt + n−1/2 dW (t) ⇔ Yi = θi + n−1/2 i , i ∈N d(pθ , pθ ) = f − f 2 6/ 26
  • 7. Examples : functional classes Θα (L) = Hölder (H(α, L)) n (α) = n−α/(2α+1) minimax rate over H(α, L) Density example : Hellinger loss Prior = DPM f (x) = fP,σ (x) = φσ (x−µ)dP(µ), σ ∼ IΓ(a, b) P ∼ DP(A, G0 ) c n sup Ef0 Π UM(n/ log n)−α/(2α+1) (f0 )|Y1 = o(1), f0 ∈Θα (L) U (f0 ) = {f , h(f0 , f ) ≤ } [ log n term necessary ? ] ⇒ Ef0 h(ˆ, f0 )2 f (n/ log n)−α/(2α+1) , ˆ(x) = E π [f (x)|Y n ] f 7/ 26
  • 8. Outline 1 Bayesian nonparametric : posterior concentration Generalities Idea of the proof Examples of loss functions where things become less nice 2 Bayesian Upper and Lower bounds Lower bound The case of ∞ and adaptation 3 Links with confidence bands 8/ 26
  • 9. Outline of the proof : Tests and KL n n Un = UM(n/ log n)−α/(2α+1) and ln (θ) = log pθ (Y1 ) ¯n = (n/ log n)−α/(2α+1) c Un eln (θ)−ln (θ0 ) dΠ(θ) Nn c n Π [Un |Y1 ] = := Θ eln (θ)−ln (θ0 ) dΠ(θ) Dn n φn = φn (Y1 ) ∈ [0, 1] 2 Eθ0 (Π [Un |Y1 ]) ≤ Eθ0 [φn ] + Pθ0 Dn < e−cn ¯ c n n n 2 + e(c+τ )n n Eθ [1 − φn ] dπ(θ) c Un 9/ 26
  • 10. Constraints 2 n Eθ0 [φn ] = o(1) & sup Eθ [1 − φn ] = o(e−cn ¯ ) → d(., .) n d(θ,θ0 )>M ¯ n 2 Pθ0 Dn < e−cn ¯ n = o(1) We need : Dn ≥ eln (θ)−ln (θ0 ) dΠ(θ) Sn 2 ≥ e−2n n Π Sn ∩ {ln (θ) − ln (θ0 ) > −2n ¯ 2 } n Ok if Sn = {KL(pθ0 , pθ ) ≤ n ¯ 2 ; V (pθ0 , pθ ) ≤ n ¯ 2 } and n n n n n n 2 Π(Sn ) ≥ e−cn ¯ → links d(., .) n with KL(., .) 10/ 26
  • 11. Outline 1 Bayesian nonparametric : posterior concentration Generalities Idea of the proof Examples of loss functions where things become less nice 2 Bayesian Upper and Lower bounds Lower bound The case of ∞ and adaptation 3 Links with confidence bands 11/ 26
  • 12. White noise model and pointwise or sup-norm loss White noise dY (t) = f (t)dt + n−1/2 dW (t) ⇔ Yjk = θjk + n−1/2 jk , i ∈N pointwise loss : (f , f0 ) = (f (x0 ) − f0 (x0 ))2 sup-norm loss : ∞ (f , f0 ) = supx |f (x) − f0 (x)| Random Truncation prior J ∼ P, θj,k ∼ g(.) ∀k ∀j ≤ J, θj,k = 0 ∀k ∀j > J L2 concentration sup sup Ef0 P π f − f0 2 > M(n/ log n)−α/(2α+1) |Y = o(1) α1 ≤α≤α2 f0 ∈H(α,L) concentration ∀α ∃ > 0 2 /(2α+1)2 sup Ef0 P π (f , f0 ) > (n/ log n)−2α |Y = 1 + o(1) f0 ∈H(α,L) 12/ 26
  • 13. If Deterministic Truncation prior J := Jn (α) : 2Jn (α) = (n/ log n)1/(2α+1) θj,k ∼ g(.) ∀k ∀j ≤ Jn (α) θj,k = 0 ∀k ∀j > Jn (α) L2 concentration sup Ef0 P π f − f0 2 > M(n/ log n)−α/(2α+1) |Y = o(1) f0 ∈H(α,L) concentration ∀α ∃ > 0 sup Ef0 P π (f , f0 ) > M(n/ log n)−α/(2α+1) |Y = o(1) f0 ∈H(α,L) • What does it mean ? Can we have adaptation with or ∞? 13/ 26
  • 14. Why didn’t it work ? Same problem as in freq. (see M Low’s papers) 0 2 f1 = 0, f0 : θj,0 = n (α), ∀2j ≤ L(n/ log n)2α/(2α+1) , j ≥ 1 then 2 /(2α+1)2 (θj,k )2 ≤ 0 2 n (α) , 2j/2 θj,0 0 (n/ log n)−2α j≥1,k j≥1 and P π [J = 0|Y ] = 1 + oP0 (1) f0 looks too much like f1 = 0 14/ 26
  • 15. Gine & Nickl : posterior concentration rates for ∞ via tests At best they have sup Ef0 P π ∞ (f , f0 ) > M(n/ log n)−(α−1/2)/(2α+1) |Y = o(1) f0 ∈H(α,L) Proof based on tests → suboptimal. Can we do better ? 15/ 26
  • 16. Outline 1 Bayesian nonparametric : posterior concentration Generalities Idea of the proof Examples of loss functions where things become less nice 2 Bayesian Upper and Lower bounds Lower bound The case of ∞ and adaptation 3 Links with confidence bands 16/ 26
  • 17. Bayesian Lower bounds : white noise model Let d(θ, θ ) be a symetrical semi-metric, e.g. d(θ, θ ) = (fθ , fθ ) or ∞ (fθ , fθ ). Dual of modulus of continuity φ(θ, ) = inf { θ − θ 2 |d(θ, θ )> } θ ∈Θ φ( ) = inf φ(θ, ) θ Theorem Let C > 0 and n such that Qn (C) := {θ; φ(θ, 2 n ) ≤ Cφ( n )} = ∅. Then ∀θ0 ∈ Qn (C), ∀Π, ∃K > 0 2 Eθ0 ,n [P π [d(θ, θ0 ) ≥ n |Y ]] ≥ e−Knφ( n ) 17/ 26
  • 18. Consequences Consequence 1 : We√ obtain as in Cai and Low n = inf{ , φ( ) > M/ n} then ∀un = o( n ) Eθ0 ,n [P π [d(θ, θ0 ) ≥ un |Y ]] = o(1) 18/ 26
  • 19. Consequences Consequence 1 : We√ obtain as in Cai and Low n = inf{ , φ( ) > M/ n} then ∀un = o( n ) Eθ0 ,n [P π [d(θ, θ0 ) ≥ un |Y ]] = o(1) Consequence 2 : If φ( n ) = o( 2 ) n 2 2 e−Knφ( n) >> e−Kn n → Proof based on tests will lead to suboptimal concentration rates. ¯n = inf{ ; φ( ) > M n } 18/ 26
  • 20. Outline 1 Bayesian nonparametric : posterior concentration Generalities Idea of the proof Examples of loss functions where things become less nice 2 Bayesian Upper and Lower bounds Lower bound The case of ∞ and adaptation 3 Links with confidence bands 19/ 26
  • 21. The case of ∞ Yj,k = θj,k + n−1/2 j,k , j,k ∼ N (0, 1) i.i.d ∞ (fθ , fθ )= max 2j/2 |θj,k − θj,k | k j log n φ( n (β)) = O √ , n (β) = (n/ log n)−β/(2β/1) , Θ = H(β, L) n Theorem There is a prior Π s.t. ∀C < 1/2 sup sup Eθ0 (P π [ ∞ (θ0 , θ) > M n (β)|Y ]) ≤ e−C log n β1 ≤β≤β2 θ0 ∈H(β,L) n (β) := (n/ log n)−β/(2β+1) Sieve prior (discrete prior) Spike and slab 20/ 26
  • 22. spike and slab prior ∀j ≤ Jn with 2Jn ≈ n , ∀k 1 1 θj,k ∼ 1− δ(0) + g(.) n n with log g smooth (Laplace, Gaussian, Student) Adaptive posterior concentration in L2 (loosing a log n) and ∞ (n/ log n)−α/(2α+1) 21/ 26
  • 23. Some connections with confidence sets Adaptive confidence sets Cn inf Pθ (θ ∈ Cn ) ≥ 1 − α θ and −1 sup sup n (β) Eθ0 [|Cn |] < +∞ β1 ≤β≤β2 θ0 ∈H(β,L) with |Cn | = supθ,θ ∈Cn d(θ, θ ) If d(., .) = ∞ Does not exist (M. Low) Hoffman and Nickl H(β1 , L) ∪ H(β2 , L) with β2 > β1 ˜ Θn = H(β2 , L) ∪ {θ ∈ H(β1 , L); ∞ (θ, H(β2 , L)) > M n (β1 )} ˜ Then Adaptive confidence set in Θn 22/ 26
  • 24. 1rst Bayesian perspective H(β1 , L) ∪ H(β2 , L) with β2 > β1 If sup sup Eθ0 (P π [ ∞ (θ0 , θ) > M n (β)|Y ]) ≤ e−C log n β∈{β1 ,β2 } θ0 ∈H(β,L) Set Cn = {θ0 ; P π [ ∞ (θ0 , θ) > M n (β, θ0 )|Y ] < n−C /α} Then Pθ0 [θ0 ∈ Cn ] ≤ αnC Eθ0 [P π [ c ∞ (θ0 , θ) > M n (β, θ0 )|Y ]] ≤ α problem : Control of Eθ [|Cn |] sup Eθ [|Cn |] n (β1 ) → OK θ∈H(β1 ,L) sup Eθ [|Cn |] n (β1 ) → BAD θ∈H(β2 ,L) ˜ But on Θ := H(β2 , L) ∪ {θ ∈ H(β1 , L); ∞ (θ, H(β2 , L)) > n (β1 )} ˜ ˜ Cn = Cn ∩ Θ Adaptive confidence set 23/ 26
  • 25. A better ( ?) Bayesian perspective : back to basics If sup sup Eθ0 (P π [ ∞ (θ0 , θ) > M n (β)|Y ]) ≤ e−C log n β∈[β1 ,β2 ] θ0 ∈H(β,L) and ˆ −1 sup sup Eθ0 ∞ (θ0 , θ) n (β) < +∞ β∈[β1 ,β2 ] θ0 ∈H(β,L) Cn = {θ; ∞ (θ, θ) ˆ ≤ kn (αn )}, P π [θ ∈ Cn |Y ] ≥ 1 − αn Then Pθ [θ ∈ Cn ] dπ(θ) ≥ 1 − αn Θ sup Eθ [|Cn |] ≤ 2M n (β), θ∈H(β,L) If Θ is bounded. 24/ 26
  • 26. Conclusion • Bayesian is great for risks that are related to Kullback : L2 in regression, hellinger or L1 in density etc. • How to understand some specific features in these big models ? More tricky • Good nonparametric priors : Have good properties for a wide range of loss functions • Why should we care ? → interpretation of credible bands ! ? • Extension to other models than white noise . [Done] • Can we go further than 2nd Bayesian interpretation (confidence sets) ? 25/ 26
  • 27. THANK YOU 26/ 26