SlideShare uma empresa Scribd logo
1 de 74
Baixar para ler offline
Statistical Moeling of Extreme
  Values: Basic Theory and Its
Implementation in Open Source
  Programing Environment R
              Nader Tajvidi
   Department of Mathematical Statistics
       Lund Institute of Technology
                 Box 118
             SE-22100 Lund
                  Sweden

              August 6, 2010




             Khon Kaen University
Outline


• Some examples of application of extreme value
  theory

• Univariate extreme value distributions

• Characterisation of multivariate extreme value
  distributions

• Bivariate extreme value distributions

• Parametric models for the dependence function

• Parametric and nonparametric estimation of the
  dependence function

• Monte Carlo approximations to mean integrated
  squared errors of parametric and nonparametric
  estimators

• Application to Australian temperature data

Khon Kaen University                       August 6, 2010
Annual maximum sea levels at Port Pirie, South
                         Australia


                                        4.6




                                        4.4




                                        4.2




                                        4.0




                   Sea−Level (meters)
                                        3.8




                                        3.6


                                              1930   1940   1950          1960   1970   1980

                                                                   Year




Khon Kaen University                                                                           August 6, 2010
Breaking strengths of glass fibers

                                            Histogram of breaking strengths of glass fibers



                       Percent of Total   30

                                          20

                                          10

                                           0

                                                0.5         1.0       1.5          2.0

                                                             Breaking Strength


                                            Density plot of breaking strengths of glass fibers




                                          1.5
                       Density




                                          1.0

                                          0.5

                                          0.0

                                                      0.5     1.0    1.5     2.0         2.5

                                                             Breaking Strength




Khon Kaen University                                                                           August 6, 2010
Annual maximum sea levels at Fremantle, Western
                       Australia


                                        1.8




                                        1.6




                   Sea−Level (meters)
                                        1.4




                                        1.2


                                              1900   1920   1940     1960   1980

                                                              Year




Khon Kaen University                                                               August 6, 2010
Annual maximum sea levels at Fremantle, Western
       Australia, versus mean annual value of Southern
                       Oscillation Index


                                            1.8




                                            1.6




                       Sea−Level (meters)
                                            1.4




                                            1.2


                                                  −1   0         1   2

                                                           SOI




Khon Kaen University                                                     August 6, 2010
Comparing Port Pirie and Fremantle datasets

                                        4.6
                                        4.4
                                        4.2
                                        4.0
                                        3.8




                   Sea−Level (meters)
                                        3.6

                                                     1930    1940   1950          1960      1970   1980

                                                                           Year




                                        1.8

                                        1.6

                                        1.4




                   Sea−Level (meters)
                                        1.2

                                              1900          1920       1940              1960      1980

                                                                           Year




Khon Kaen University                                                                                      August 6, 2010
Daily closing prices of the Dow Jones Index


                                                                   dowjones




                           11000
                           9000




                   Index
                           7000
                           5000
                                   Q1     Q3   Q1     Q3   Q1     Q3    Q1     Q3   Q1     Q3   Q1     Q3 Q1
                                        1995        1996        1997         1998        1999        2000 2001




                                                                 Year




Khon Kaen University                                                                                             August 6, 2010
Log-daily returns of the Dow Jones Index


                                                                 log.daily.return




                           0.04
                           0.02
                           0.00
                           −0.02




                   Index
                           −0.04
                           −0.06
                                   Q1     Q3   Q1     Q3   Q1     Q3    Q1     Q3   Q1     Q3   Q1     Q3 Q1
                                        1995        1996        1997         1998        1999        2000 2001




                                                                 Year




Khon Kaen University                                                                                             August 6, 2010
Dow Jones Index data


                                      dowjones                      log.daily.return




                                                           0.04




                           11000
                                                           0.02
                                                           0.00




                           9000
                                                           −0.02




                   Index
                                                   Index




                           7000
                                                           −0.04
                                                           −0.06




                           5000
                                   Q1 Q1   Q1 Q1                   Q1 Q1     Q1 Q1
                                   1995    1999                    1995      1999




                                    Year                            Year




Khon Kaen University                                                                   August 6, 2010
Windstorm loss data

• Windstorm losses of the Swedish insurance group
  L¨nsf¨rs¨kringar during the period 1982 to 1993
   a o a

• The database contains:
    – The individual amounts of all claims
    – The place and time of the claims
    – The type of the claim

• 46 storm events, with a total claimed amount of
  510 million Swedish crowns (MSEK)

• Farm insurance comprising of approximately 65% of
  the total amount

• All values were corrected for inflation

• No adjustments for portfolio changes




Khon Kaen University                         August 6, 2010
Windstorm losses 1982-1993




                       Feb
                             92

                       Dec 88
                                  4
                             n8
                                           Ja
                         Ja
                                      83



                                            n
                                                93
                                  Jan




Questions:

• How can we predict the size of the next very severe
  storm?

• How much reinsurance does a company need to
  buy?

Khon Kaen University                                 August 6, 2010
Windstorm losses which exceed the level
   u = 0.9 MSEK, for 1982 – 1993

                                                                                      Jan 93
                               120
                               100
        storm loss (in MSEK)
                               80
                               60




                                         Jan 83
                                                  Jan 84
                               40




                                                                 Dec 88
                                                                                Feb 92
                               20
                               0




                                     0               10    20              30    40
                                                            storm number




Khon Kaen University                                                                           August 6, 2010
Australian temperature data

• A very large dataset on annual maximum and
  minimum average daily temperatures at 224 stations
  across Australia



             Queensland
             New South Wales
             Victoria
             South Australia
             West Australia
             Northern Territory
             Tasmania




Khon Kaen University                       August 6, 2010
Annual maximum temperatures in
              Victoria, Australia



• The maximum value, over all 34 weather stations
  that were operating in the state of Victoria from
  1910 to 1993, of annual temperatures (in degrees
  Celsius) during this period.




Khon Kaen University                      August 6, 2010
39                                                    33.6
                                                                            33.5
                      35
                                                                            33.4




                                                                        ˆ
                                                                        μ
                      33                                                    33.3




       Temperature
                      31                                                    33.2

                              1920      1940        1960   1980                    0.0   0.2   0.4       0.6   0.8   1.0

                                               Year                                                  t




                     0.70                                                   -0.1
                     0.65                                                   -0.2
                     0.60




      ˆ
      σ
                                                                            -0.3
                                                                        ˆ
                                                                        γ


                     0.55
                                                                            -0.4
                     0.50

                            0.0   0.2     0.4       0.6    0.8    1.0              0.0   0.2   0.4       0.6   0.8   1.0

                                                t                                                    t



Khon Kaen University                                                                                                       August 6, 2010
Average annual maximum temperature


• The average annual maximum is derived by taking
  the mean of maximum annual temperature readings
  at 224 weather stations across Australia in the
  period 1890–1993.



             Queensland
             New South Wales
             Victoria
             South Australia
             West Australia
             Northern Territory
             Tasmania




Khon Kaen University                    August 6, 2010
Location and scale estimates with Gaussian fit
                     29.0
                                                                          28.5
                     28.5
                     28.0                                                 28.0
                     27.5




                                                                      ˆ
                                                                      μ
                     27.0                                                 27.5




       Temperature
                     26.5
                     26.0                                                 27.0

                              1900       1940            1980                      0.0     0.2    0.4       0.6    0.8     1.0

                                          Year                                                          t




                                                                          -136.3
                     0.50                                             j
                     0.45
                     0.40                                                 -136.5




      ˆ
      σ
                     0.35
                                                                      C1




                     0.30
                     0.25                                                 -136.7
                            0.0   0.2   0.4       0.6   0.8     1.0                      0.12    0.16       0.20    0.24

                                              t                                                         h



Khon Kaen University                                                                                                             August 6, 2010
Another application to Australian
                       temperature data


• maximum annual values of average daily
  temperature measurements at two meteorological
  stations, Leonora (latitude 28.53, longitude 121.19)
  and Menzies (latitude 29.42, longitude 121.02), in
  Western Australia during the period 1898–1993.




                27
      Menzies




                26




                25




                24

                     25   26         27   28

                               Leonora




Khon Kaen University                           August 6, 2010
Annual Maximum Wind Speeds in 1944-1983

                                                                        80




                                                                        70




                                                                        60




                                                                        50




                   Annual Maximum Wind Speed (konts) at Hartford (CT)
                                                                             40   45            50            55            60       65

                                                                                  Annual Maximum Wind Speed (konts) at Albany (NY)




Khon Kaen University                                                                                                                      August 6, 2010
Concurrent measurements of wave and surge height in
                 south west England

                                0.8



                                0.6



                                0.4



                                0.2




                   Surge (m)
                                0.0



                               −0.2




                                      0   2   4            6        8   10

                                                  Wave Height (m)




Khon Kaen University                                                         August 6, 2010
The framework
1. A proper mathematical model has to be chosen in
   each case.
    • parametric; best if the model is correct
    • non parametric; can not be used for extrapolation
      outside the observed values
    • semi parametric; very flexible (main subject of
      this talk)

2. Parameters in each model have to be estimated
   based on the historical data. Which method should
   be used?

3. These estimates are our “best guesses” of the
   process which is being analyzed. How to specify
   uncertainty in the estimates?.

4. Goodness of fit. Does the model give a good
   representation of the historical data?

5. How can we reduce the uncertainties in our models?
   How can extra information be incorporated in the
   models?

Khon Kaen University                          August 6, 2010
Univariate Extreme Value Distributions


                  X1, X2, . . ., Xn, iid X ∼ F (x)

          Mn = max(X1, X2, . . ., Xn), n ∈ N


                       an > 0 and bn ∈ R

            Mn − bn
 lim P (            ≤ x) = lim F n(anx + bn) = G(x)
n→∞           an          n→∞


                       G(x) non-degenerate

                            F ∈ D(G)

     F (x) belongs to domain of attraction of G(x)




Khon Kaen University                                 August 6, 2010
Type I:

                               0         x<0
                  Φα(x) =
                               exp(−x−α) x ≥ 0


Type II:

                              exp(−(−xα)) x < 0
                Ψα(x) =
                              1           x≥0


Type III:

                       Λ(x) = exp(−e−x)   x∈R


Generalised Extreme Value Distribution

                                      x−μ γ
                                          1
          G(x; γ, μ, σ) = exp{−(1 − γ    )+ }
                                       σ


Khon Kaen University                              August 6, 2010
Multivariate Extreme Value Distributions

                              (1)            (d)
         {Xn, n ≥ 1} = {(Xn , . . . , Xn ), n ≥ 1}

                         X ∼ F (x) iid


                                      n                  n
         (1)          (d)                   (1)                 (d)
  Mn = (Mn , . . . , Mn ) = (             Xj , . . . ,         Xj )
                                    j=1                  j=1


                        (i)         (i)
                       σn > 0, un ∈ R


        P [(Mn − u(i))/σn ≤ x(i), 1 ≤ i ≤ d] =
             (i)
                  n
                        (i)

        F n(σn x(1) + u(1), . . . , σn x(d) + u(d)) → G(x)
             (1)
                       n
                                     (d)
                                               n


                marginal Gi of G non-degenerate

                          F ∈ D(G)

      F (x) belongs to domain of attraction of G(x)

Khon Kaen University                                      August 6, 2010
Characterisation of Multivariate Extreme
           Value Distributions



        P [(Mn − u(i))/σn ≤ x(i), 1 ≤ i ≤ d] =
             (i)
                  n
                        (i)

        F n(σn x(1) + u(1), . . . , σn x(d) + u(d)) → G(x)
             (1)
                       n
                                     (d)
                                               n


Definition. A df G in Rd is called max-stable if for
every t > 0

Gt(x) = G(α(1)(t)x(1)+β (1)(t), . . . , α(d)(t)x(d)+β (d)(t)).

Definition. A df G in Rd is called max-infinitely
divisible (max-id) if F t(x1, . . . , xd) is a df for every
t > 0.

    G(∞, ∞, . . . , xi, . . . , ∞) = Φ1(xi) = exp(−x−1 )
                                                    i


             G∗(x) is a MEVD with Φ1 marginals



Khon Kaen University                             August 6, 2010
Characterisation of Max-id and
            Max-Stable Distributions
           F max-id iff for a Radon measure μ on
             E := [k, ∞] {k}, k ∈ [−∞, ∞)d

                       exp{−μ[−∞, y]c} y ≥ k
     F (y) =
                       0               otherwise

     The measure μ is called an exponent measure.
    G(∞, ∞, . . . , xi, . . . , ∞) = Φ1(xi) = exp(−x−1 )
                                                    i

   G∗(x) is a MEVD with Φ1 marginals if for a finite
                  measure S on

                        ℵ = {y : y = 1}

                                      d
                                            a(i)
            G∗(x) = exp         −            (i)
                                                 S(da)
                                    ℵ i=1   x


                       a(i)S(da) = 1, 1 ≤ i ≤ d
                   ℵ



Khon Kaen University                                     August 6, 2010
Bivariate Extreme Value Distributions



                                              −μ∗ [0,(x,y)]c
                        G∗(x, y) = e

                                  1 1     x
                 μ∗[0, (x, y)] = ( + )A(
                                    c
                                             )
                                  x y    x+y
                            1
        A(w) =                  max{q(1 − w), (1 − q)w}S(dq)
                        0
              A(w) is called dependence function.

                 1                      1
                     qS(dq) =               (1 − q)S(dq) = 1
             0                      0


• A(0) = A(1) = 1

• max{w, 1 − w} ≤ A(w) ≤ 1

• A(w) is convex for w ∈ [0, 1]


Khon Kaen University                                           August 6, 2010
Some examples of the dependence function

                          1.0




                          0.9




                          0.8




                   A(w)
                          0.7




                          0.6

                                  Mixed
                                  Generalised mixed
                          0.5     Asym. mixed


                                0.0             0.2   0.4       0.6   0.8   1.0

                                                            w




Khon Kaen University                                                              August 6, 2010
Parametric Models for the Dependence
              Function



1. The mixed model

                          1 1  θ
        μ∗([0, (x, y)] ) = + −
                       c
                                  ,                0≤θ≤1
                          x y x+y

               A(w) = θw2 − θw + 1,        0≤θ≤1
    • θ = 0 gives independent case
    • Complete dependence is not possible

2. The logistic model

           μ∗([0, (x, y)]c) = (x−r + y −r )1/r ,    r≥1

              A(w) = {(1 − w)r + wr }1/r ,         r≥1
    • r = 1 gives independent case
    • r = +∞ gives complete dependence

Khon Kaen University                                August 6, 2010
The Generalised Symmetric Mixed Model



                       1 1
                       c         1
     μ∗([0, (x, y)] ) = + − k( p     )1/p,              (0 ≤ k ≤ 1, p ≥ 0)
                       x y    x + yp

                                                  k
                           A(w) = 1 −                         1
                                              −p              p
                                        (1 − w)       + w−p

• Independence for k = 0 or p = 0

• Complete dependence can be obtained with k = 1 and p = ∞ (Not
  possible in the symmetric or asymmetric mixed model)

Khon Kaen University                                                     August 6, 2010
The Generalised Symmetric Logistic Model



                     1    1      k       1
                       c                 p,
 μ∗([0, (x, y)] ) = ( p + p +      p/2
                                       )           (0 < k ≤ 2(p − 1), p ≥ 2)
                     x   y    (xy)

                                                                     1
                                                                 p   p
                                          p                      2
                           A(w) = (1 − w) + wp + k ((1 − w) w)

• k = 2 gives the symmetric logistic model

• Independence corresponds to p = 2 and k = 2

• Complete dependence for k = 2 and p = +∞

Khon Kaen University                                                     August 6, 2010
The Parameter Region for the Generalised Symmetric
                    Logistic Model


                       k




                                                               logistic model



                           2

                                           equivalent models

                               0
                                   2   4     6                     p




Khon Kaen University                                                            August 6, 2010
The Asymmetric Mixed Model



                       c   x3 + 3 x2 y − 2 φ x2 y − θ x2 y + 3 x y 2 − φ x y 2 − θ x y 2 + y 3
μ∗([0, (x, y)] ) =
                                                      x y (x + y)2

 A(w) = φw3 + θw2 − (θ + φ)w + 1,                     (θ ≥ 0, θ + 2φ ≤ 1, θ + 3φ ≥ 0)

• Symmetric mixed model for φ = 0

• Independent case for θ = φ = 0 (Complete dependence is not possible)

• The parameter φ stands for non-symmetry in the model


Khon Kaen University                                                            August 6, 2010
The Asymmetric Logistic Model



                                                         φr xr +θ r y r 1          φr xr +θ r y r 1
                            (1 − φ) x + (1 − θ) y +   x ( (x+y)r ) r        +   y ( (x+y)r ) r
                       c
μ∗([0, (x, y)] ) =
                                                         xy

A(w) = {(θ(1 − w))r + (φw)r }1/r + (θ − φ)w + 1 − θ, (0 ≤ θ, φ ≤ 1, r ≥ 1)

• For θ = φ = 1 this model reduces to the corresponding symmetric logistic
  model which gives the diagonal case for r = +∞.

• Independence is obtained for θ = 0 and for φ = 0 or r = 1.



Khon Kaen University                                                                    August 6, 2010
Estimation of the dependence function



• Nonparametric methods
   1. Pickands estimator (1981)
   2. Cap´ra`, Foug`res and Genest’s estimator (1997)
          e a       e

• Maximum likelihood based on parametirc models

                   New Nonparametric Methods:

1. Convex hull of modified Pickands estimator

2. Constrained smoothing splines




Khon Kaen University                            August 6, 2010
Pickands estimator

• Suppose (X, Y ) has a bivariate extreme value
  distribution with exponential margins.

• min{X/(1 − w), Y /w} has                  an   exponential
  distribution with mean 1/A(w).

• the maximum likelihood estimator of A(w) is

                            n                              −1

     An(w) = n                   min {Xi/(1 − w), Yi/w}
                           i=1



• For each 0 ≤ w ≤ 1, 1/An(w) is an unbiased and
  strongly consistent estimator of 1/A(w).

• δn(w) = n1/2 1/An(w) − 1/A(w) satisfies the
  central limit theorem in C(0, 1), B ; see Deheuvels,
  P. (1991).


Khon Kaen University                               August 6, 2010
Pickands estimates for 100 simulated data from Logistic
             model with r = 1.1 and r = 1.3



                                                                              1.2




         1.0
                                                                              1.0




  A(w)
                                                                       A(w)
         0.8
                                                                              0.8




         0.6                                                                  0.6

                 Pickands                                                             Pickands
                 Logistic, r = 1.1                                                    Logistic, r = 1.3


               0.0                   0.2   0.4       0.6   0.8   1.0                0.0                   0.2   0.4       0.6   0.8       1.0

                                                 w                                                                    w




Khon Kaen University                                                                                                                  August 6, 2010
Pickands estimates for 100 simulated data from Logistic
             model with r = 1.6 and r = 2



         1.0
                                                                              1.0



         0.9
                                                                              0.9




         0.8                                                                  0.8




  A(w)
                                                                       A(w)
         0.7                                                                  0.7




         0.6                                                                  0.6


                 Pickands                                                             Pickands
         0.5     Logistic, r = 1.6                                            0.5     Logistic, r = 2


               0.0                   0.2   0.4       0.6   0.8   1.0                0.0                 0.2   0.4       0.6   0.8       1.0

                                                 w                                                                  w




Khon Kaen University                                                                                                                August 6, 2010
Cap´ra`, Foug`res and Genest’s
            e a       e
                   estimator


• Copula for a bivariate extreme value distribution
  with marginals F (x) and G(y)

            C(u, v) = P {F (x) ≤ u, G(y) ≤ v}
                                       log(u)
                       = exp log(uv)A
                                      log(uv)


• Ui, Vi ≡ {F (Xi), G(Yi)}(1 ≤ i ≤ n)

                                      log(Ui)
• Pseudo-observations Zi =           log(UiVi ) (1   ≤ i ≤ n)

• H(z) = P (Zi ≤ z) = z + z(1 − z)D(z) where
  D(z) = A (z)/A(z) for all 0 ≤ z ≤ 1

                       t H(z)−z                        1 H(z)−z
• A(t) = exp           0 z(1−z)
                                dz    = exp −         t z(1−z)
                                                                dz
                          t Hn (z)−z
   1. A0 (t) = exp
       n                  0 z(1−z)
                                     dz

Khon Kaen University                                     August 6, 2010
1 Hn(z)−z
   2. A1 (t) = exp −
       n                        t z(1−z)
                                           dz

• log An(t) = p(t) log A0 (t) + {1 − p(t)} log A1 (t)
                        n                       n

                       definition of the estimator:

Denote the ordered values of Zi by Z(1), . . . , Z(n) and
                       define

                   i                      1/n

     Qi =               Z(k)/(1 − Z(k))         (1 ≤ i ≤ n).
                 k=1


                       Then An can be written as
        ⎧
        ⎪ (1 − t)Q1−p(t)
        ⎨          n                                 0 ≤ t ≤ Z(1)
                           1−p(t) −1
An(t) =   ti/n(1 − t)1−i/nQn     Qi                  Z(i) ≤ t ≤ Z(i+1)
        ⎪
        ⎩    −p(t)
          tQn                                        Z(n) ≤ t ≤ 1

• An(0) = An(1) = 1 if p(0) = 1 − p(1) = 1.




Khon Kaen University                                   August 6, 2010
Cap´ra`’s estimates for 100 simulated data from Logistic
   e a
             model with r = 1.1 and r = 1.3



         1.0
                                                                              1.0



         0.9
                                                                              0.9




         0.8                                                                  0.8




  A(w)
                                                                       A(w)
         0.7                                                                  0.7




         0.6                                                                  0.6


                 Caperaa                                                              Caperaa
         0.5     Logistic, r = 1.1                                            0.5     Logistic, r = 1.3


               0.0                   0.2   0.4       0.6   0.8   1.0                0.0                   0.2   0.4       0.6   0.8       1.0

                                                 w                                                                    w




Khon Kaen University                                                                                                                  August 6, 2010
Cap´ra`’s estimates for 100 simulated data from Logistic
   e a
              model with r = 1.6 and r = 2



         1.0                                                                  1.0




         0.9                                                                  0.9




         0.8                                                                  0.8




  A(w)
                                                                       A(w)
         0.7                                                                  0.7




         0.6                                                                  0.6



                 Caperaa                                                              Caperaa
         0.5     Logistic, r = 1.6                                            0.5     Logistic, r = 2


               0.0                   0.2   0.4       0.6   0.8   1.0                0.0                 0.2   0.4       0.6   0.8       1.0

                                                 w                                                                  w




Khon Kaen University                                                                                                                August 6, 2010
Modified Pickands estimator

                       (1)       (2)
• Let Yi = (Yi , Yi ) for 1 ≤ i ≤ n be independent
  and identically extreme value distributed random
  variables with exponential margins.

      ¯
• Put Y ( ) = n−1
                                        ( )
                                       Yi     and Yi
                                                     ( )      ( )¯
                                                           = Yi /Y (       )
                                                                               for
                                   i
   = 1, 2.

                                               (1)              (2)
• B(u) ≡ n−1 i=1 min Yi /(1 − u), Yi /u is
                             n

  uniformly root-n consistent for B(u) ≡ A(u)−1.

1. The estimator of the dependence function passes
   through the points (0, 1) and (1, 1), and has
   gradients −1 and 1 at these respective points.

                                   ˆ
2. B(u) ≤ min{1/(1−u), 1/u} so A ≡ B −1 lies above
   the lower boundary of the trianglur area.

                                   ˜     ˆ
3. The greatest convex minorant, A, of A satisfies all
   necessary conditions for a dependence function.


Khon Kaen University                                                August 6, 2010
Modified Pickands estimates for 100 simulated data
            from Logistic model with r = 1.1 and r = 1.3



         1.0                                                                    1.0




         0.9                                                                    0.9




         0.8                                                                    0.8




  A(w)
                                                                         A(w)
         0.7                                                                    0.7




         0.6                                                                    0.6

                 chull of modified Pickand                                              chull of modified Pickand
                 Modified Pickands                                                      Modified Pickands
         0.5     Logistic, r = 1.1                                              0.5     Logistic, r = 1.3


               0.0               0.2         0.4       0.6   0.8   1.0                0.0               0.2         0.4       0.6   0.8       1.0

                                                   w                                                                      w




Khon Kaen University                                                                                                                      August 6, 2010
Modified Pickands estimates for 100 simulated data
             from Logistic model with r = 1.6 and r = 2



         1.0                                                                    1.0




         0.9                                                                    0.9




         0.8                                                                    0.8




  A(w)
                                                                         A(w)
         0.7                                                                    0.7




         0.6                                                                    0.6

                 chull of modified Pickand                                              chull of modified Pickand
                 Modified Pickands                                                      Modified Pickands
         0.5     Logistic, r = 1.6                                              0.5     Logistic, r = 2


               0.0               0.2         0.4       0.6   0.8   1.0                0.0               0.2         0.4       0.6   0.8       1.0

                                                   w                                                                      w




Khon Kaen University                                                                                                                      August 6, 2010
Constrained smoothing splines


   ˆ
• A may be approximated by a spline that is
  constrained to satisfy all the necessary conditions
  on the dependence function.

• Choose regularly spaced points 0 = t0 < . . . <
  tm = 1 in the interval [0, 1].

                                           ˜
• Given a smoothing parameter s > 0, take As to be
  a polynomial smoothing spline of degree 3 or more
  which minimises
              m                                 1
                   ˆ        ˜
                  {A(tj ) − As(tj )}2 + s           ˜
                                                    As (t)2 dt ,
            j=1                             0


    subject to
   1.   ˜       ˜
        As(0) = As(1) = 1
   2.   ˜                 ˜
        As(0) ≥ −1 and As(1) ≤ 1
   3.   ˜
        As ≥ 0 on [0, 1].


Khon Kaen University                                       August 6, 2010
Smoothed spline of modified Pickands estimates for 100
 simulated data from Logistic model with r = 1.1 and
                       r = 1.3



         1.0                                                             1.0




         0.9                                                             0.9




         0.8                                                             0.8




  A(w)
         0.7                                                      A(w)   0.7




         0.6                                                             0.6

                 Smoothed spline                                                 Smoothed spline
                 Modified Pickands                                               Modified Pickands
         0.5     Logistic, r = 1.1                                       0.5     Logistic, r = 1.3


               0.0              0.2   0.4       0.6   0.8   1.0                0.0              0.2   0.4       0.6   0.8       1.0

                                            w                                                               w




Khon Kaen University                                                                                                        August 6, 2010
Smoothed spline of modified Pickands estimates for 100
 simulated data from Logistic model with r = 1.6 and
                       r=2



         1.0                                                             1.0




         0.9                                                             0.9




         0.8                                                             0.8




  A(w)
         0.7                                                      A(w)   0.7




         0.6                                                             0.6

                 Smoothed spline                                                 Smoothed spline
                 Modified Pickands                                               Modified Pickands
         0.5     Logistic, r = 1.6                                       0.5     Logistic, r = 2


               0.0              0.2   0.4       0.6   0.8   1.0                0.0              0.2   0.4       0.6   0.8       1.0

                                            w                                                               w




Khon Kaen University                                                                                                        August 6, 2010
Which model to use in practice?


• maximum likelihood of parametric models, e.g.
   1.   symmetric mixed model
   2.   symmetric logistic model
   3.   asymmetric mixed model
   4.   asymmetric logistic model
   5.   generalised symmetric logistic model
   6.   generalised asymmetric mixed model

• Nonparametirc methods including
   1. the Pickands (1981, 1989) estimator
   2. the convex hull of Pickands’ estimator
   3. the estimator proposed by Cap´ra`, Foug`res and
                                    e a      e
      Genest (1997)
   4. the convex hull of the latter
   5. our modification of Pickands’ estimator
   6. the convex hull of the latter

• constrained smoothing splines fitted to any of these
  nonparametric estimators

Khon Kaen University                           August 6, 2010
Khon Kaen University
                              1.0




                              0.9




                              0.8




                       A(w)
                                      Smoothed spline
                                      Logistic
                                      Mixed
                              0.7     Generalised logistic
                                      Generalised mixed
                                      Asym. logistic
                                      Asym. mixed
                              0.6     Pickands
                                      chull of modified Pickand
                                      Caperaa
                                      Modified Pickands
                              0.5     Logistic, r = 1.1


                                    0.0               0.2         0.4       0.6   0.8   1.0

                                                                        w




August 6, 2010
Khon Kaen University
                              1.0



                              0.9



                              0.8




                       A(w)
                                      Smoothed spline
                                      Logistic
                                      Mixed
                              0.7     Generalised logistic
                                      Generalised mixed
                                      Asym. logistic
                                      Asym. mixed
                              0.6     Pickands
                                      chull of modified Pickand
                                      Caperaa
                                      Modified Pickands
                              0.5     Logistic, r = 1.3


                                    0.0               0.2         0.4       0.6   0.8   1.0

                                                                        w




August 6, 2010
1.0




Khon Kaen University
                              0.9




                              0.8




                       A(w)
                                      Smoothed spline
                                      Logistic
                              0.7     Mixed
                                      Generalised logistic
                                      Generalised mixed
                                      Asym. logistic
                                      Asym. mixed
                              0.6     Pickands
                                      chull of modified Pickand
                                      Caperaa
                                      Modified Pickands
                              0.5     Logistic, r = 1.6


                                    0.0               0.2         0.4       0.6   0.8   1.0

                                                                        w




August 6, 2010
1.0




Khon Kaen University
                              0.9




                              0.8




                       A(w)
                                      Smoothed spline
                                      Logistic
                              0.7     Mixed
                                      Generalised logistic
                                      Generalised mixed
                                      Asym. logistic
                                      Asym. mixed
                              0.6     Pickands
                                      chull of modified Pickand
                                      Caperaa
                                      Modified Pickands
                              0.5     Logistic, r = 2


                                    0.0               0.2         0.4       0.6   0.8   1.0

                                                                        w




August 6, 2010
Monte Carlo approximations to mean integrated squared
               errors, multiplied by 105

                                           n = 25                        n = 50                   n = 100
             method                 r=1     r=2       r=3        r=1      r=2     r=3    r=1       r=2          r=3
  logistic                           197      64       14         110      34       8      42        14           4
  Pickands                          5614    2829      3331       2034     1547    1261   1172       712          567
  convex hull of Pickands           7229    2611      2775       2588     1388    1049   1430       671          477
  Cap´ra` et. al.
      e a                            889     102       35         568      49      20     307        29          10
  convex hull of Cap´ra` et. al.
                     e a            1188      95       41         666      57      25     373        32          12
  modified Pickands                  1351     138       33         614      77      18     366        37          11
  convex hull of modified Pickands   1861     139       46         815      70      24     453        38          15
                                                    smoothed spline of
  Pickands                          784     919       1020        396     728     487    215         334        220
  convex hull of Pickands           525     769       1055        282     637     490    135         327        230
  Cap´ra` et. al.
      e a                           303      82        21         177     37       12     97          24         8
  convex hull of Cap´ra` et. al.
                     e a            286      66        22         167     39       14     97          26        11
  modified Pickands                  447     104        21         240     62       12    130          31         9
  convex hull of modified Pickands   401      73        24         232     49       16    107          30        15




Khon Kaen University                                                                           August 6, 2010
Australian temperature data

• A very large dataset on annual maximum and
  minimum average daily temperatures at 224 stations
  across Australia



             Queensland
             New South Wales
             Victoria
             South Australia
             West Australia
             Northern Territory
             Tasmania




Khon Kaen University                       August 6, 2010
Application to Australian temperature
                   data


• maximum annual values of average daily
  temperature measurements at two meteorological
  stations, Leonora (latitude 28.53, longitude 121.19)
  and Menzies (latitude 29.42, longitude 121.02), in
  Western Australia during the period 1898–1993.




                27
      Menzies




                26




                25




                24

                     25   26         27   28

                               Leonora




Khon Kaen University                           August 6, 2010
Logistic models for the dependence function fitted by
       maximum likelihood to the temperature data

                          1.0




                          0.9




                          0.8




                   A(w)
                          0.7




                          0.6
                                  Logistic
                                  Generalised logistic
                                  Asym. logistic
                          0.5     Modified Pickands


                                0.0               0.2    0.4       0.6   0.8   1.0

                                                               w




Khon Kaen University                                                                 August 6, 2010
Mixed models for the dependence function fitted by
        maximum likelihood to the temperature data

                          1.0




                          0.9




                          0.8




                   A(w)
                          0.7




                          0.6
                                  Mixed
                                  Generalised mixed
                                  Asym. mixed
                          0.5     Modified Pickands


                                0.0             0.2   0.4       0.6   0.8   1.0

                                                            w




Khon Kaen University                                                              August 6, 2010
Estimating a bivariate extreme-value
           distribution function
• Let X = (X (1), X (2)) have a bivariate extreme-
  value distribution F .

• There exist monotone increasing transformations
  Tj = Tj (·|θj ) such that (T1(X (1)), T2(X (2))) has
  distribution function G0.
                                    (1)       (2)
• Given a sample {Xi = (Xi , Xi ), 1 ≤ i ≤ n},
                                        ˆ
  compute a root-n consistent estimator θj of θj from
                     (j)
  the marginal data Xi , 1 ≤ i ≤ n.

                 ˆ
• Put Tj = Tj (·|θj ) and

                                          n
               ( )           ( )                    ( )
            Yj         = T (Xj )   n−1          T (Xi ) .
                                          i=1



   ˆ                         ˆ            ˆ
• F x(1), x(2)) = G0 T1 x(1) θ1 , T2 x(2) θ2                         is
  root-n consistent for F .

Khon Kaen University                                      August 6, 2010
Distribution function estimate and semi-infinite
       prediction regions corresponding to nominal levels
                      α = 0.9, 0.95 and 0.99


                                      (a)


                                                                28.5




                                                                28.0


                                                                                                               0.99

                       fs                             Menzies   27.5                                           0.95
                                                                                                               0.9


                                                                27.0




                                                                26.5
                            Menzies
                                            Leonora
                                                                       27.0   27.5   28.0    28.5     29.0   29.5     30.0

                                                                                            Leonora




Khon Kaen University                                                                                                         August 6, 2010
Compact bivariate prediction regions
         Construct compact prediction regions by profiling the estimator

       ˜                            ∂2          ˆ ˆ
       fs(x)            =                   G1 t(1), t(2)
                                ∂x(1) ∂x(2)

                        =               ˆ           ˆ      ˆ ˆ
                                T1 x(1) θ1) T2 x(2) θ2) G1 t(1), t(2)
                                              ˆ
                                              t(2)              ˆ
                                                                t(1)             ˆ
                                                                                 t(2)
                            ×        ˜
                                    As (1)              + (1)            ˜
                                                                         A
                                          ˆ
                                          t + t(2)ˆ         ˆ
                                                            t +t    ˆ        ˆ       ˆ
                                                                     (2) s t(1) + t(2)

                                            ˆ
                                            t(2)              ˆ
                                                              t(2)             ˆ
                                                                               t(2)
                            ×      ˜s
                                  A (1)                − (1)            ˜s
                                                                       A (1)
                                        ˆ
                                        t + t(2)ˆ         t
                                                          ˆ + t(2)ˆ        ˆ
                                                                           t + t(2)ˆ
                                    ˆ ˆ
                                    t(1) t(2)              ˆ
                                                           t(2)
                            +                    A ˜
                                 ˆ
                                (t        ˆ
                                  (1) + t(2) )3 s      ˆ       ˆ
                                                       t(1) + t(2)

                                     of the density, f , of X.

Khon Kaen University                                                                 August 6, 2010
How to choose s


• CV (s) =             ˜
                       fs(x)2 dx − 2 n−1
                                           n     ˜
                                                 f−i,s(Xi).
                                           i=1


• CV (s) is an almost-unbiased approximation to
       ˜2   ˜
   E(fs − 2fsf ).

• The value of s that results from minimising CV (s)
                                    ˜
  will asymptotically minimise E(fs − f )2.

• To construct prediction regions, define

               ˜
    R(u) ≡ x : fs(x) ≥ u ,            β(u) =            ˜
                                                        fs(x) dx .
                                                 e
                                                 R(u)



• Given a prediction level α, let u = uα denote the
                                         ˜
  solution of β(u) = α. Then, R(˜α) is a nominal
                                      u
  α-level prediction region for a future value of X.




Khon Kaen University                                    August 6, 2010
Cross-validation criterion CV (s) and spline-smoothed
                                  ˜
  dependence function estimate As for s = 0.05, with the
        unsmoothed, modified Pickands estimate


                                      (a)



           -0.228                                                    1.0


           -0.230
                                                                     0.9

           -0.232


           -0.234                                                    0.8




                                                              A(w)




           CV
           -0.236
                                                                     0.7

           -0.238


           -0.240                                                    0.6

                                                                             Smoothed spline
           -0.242                                                            chull of modified Pickand
                                                                     0.5     Modified Pickands


                    0.0   0.1   0.2         0.3   0.4   0.5                0.0               0.2         0.4       0.6   0.8   1.0

                                      s                                                                        w




Khon Kaen University                                                                                                                 August 6, 2010
Plot of spline-smoothed density estimate
                    ˜
                   fs




                fs




                       Menzies
                                 Leonora




Khon Kaen University                       August 6, 2010
Compact bootstrap calibrated prediction regions with
            nominal levels α = 0.85 and 0.90




                                 28.0
                                 27.5
                                 27.0
                                 26.5
                       Menzies
                                 26.0
                                 25.5
                                               0.85




                                 25.0
                                        0.90


                                                      26   27        28   29
                                                           Leonora




Khon Kaen University                                                           August 6, 2010
Bootstrap calibration


       ¯   ˜    ˜
• Take A = A or Aλ in

      ¯                      1    1   ¯   x(1)
      F (x) = exp         − (1) + (2) A (1)               .
                            x    x      x + x(2)

                                ¯
• Compute the chosen region Rα, with nominal
  coverage α, from the data X = {X1, . . . , Xn}.

                      ¯
• By resampling from F conditional on X , compute
  a new dataset X ∗ = {X1 , . . . , Xn}, and from it
                          ∗           ∗
                         ¯          ¯
  calculate the analogue F ∗ of F , and then the
            ¯α    ¯
  analogue R∗ of Rα.

• Let γ(α) equal the probability, conditional on the
                                               ¯
  data X , that a random 2-vector drawn from F lies
     ¯α
  in R∗ .

• Let a = a(α) be the solution of γ(a) = α. Then,
           ˆ
  ¯ˆ                                        ¯
  Ra(α) is the bootstrap-calibrated form of Rα.


Khon Kaen University                           August 6, 2010
Theoretical Properties

   ˆ                                  ˜
• A and its greatest convex minorant, A, are uniformly root-n consistent
  for A:

                    ˆ                   ˜
               sup |A(u) − A(u)| + sup |A(u) − A(u)| = Op n−1/2 .
             0≤u≤1                         0≤u≤1


• if the distribution H of Y (1)/(Y (1) + Y (2)) has a bounded density then,
  for each ∈ (0, 1 ],
                    2

            sup         ˆ
                       |A (u) − A (u)| +     sup    ˜
                                                   |A (u) − A (u)| = Op n−1/2
          ≤u≤1−                            ≤u≤1−


                                                        ˆ ˆ ˜ ˜
• if A has three bounded derivatives then the biases of A, A , A, A are
  O(n−1).

Khon Kaen University                                                    August 6, 2010
Shape constrained smoothing using
              smoothing splines
Given data {(ti, yi)}, ti ∈ [a, b] for i = 1, . . . , n, what
                                      ˆ
  is the behaviour of the solution g of the following
                minimisation problem?


                        n                               b                    2
                                           2                    (m)
minimise                     yi − g(ti )       +λ           g         (u)        du,
                       i=1                          a
                                                                             (1a)
where                  g (r)(t) ≥ 0 t ∈ [a, b].                              (1b)




                              References

 [1] Mammen, E. and Thomas-Agnan, C. (1999),
     Smoothing splines and shape restrictions,
     Scandinavian Journal of Statististics, 26, 239–
     252.

Khon Kaen University                                                  August 6, 2010
Proposed Estimator for m = 2 and r ≤ 2

   For m = 2, the piecewise polynomial representation of a natural cubic
                             C 2-spline g is:


                           n
                  g(t) =         I[ti,ti+1)(t)Si(t),                              (2a)
                           i=0

where            Si(t) = ai + bi(t − ti) + ci(t − ti)2 + di(t − ti)3,   1 ≤ i ≤ n − 1,
                                                                                (2b)
                S0(t) = a1 + b1(t − t1) and Sn(t) = Sn−1(tn) + Sn−1(tn)(t − tn).


The coefficients in (2b) have to fulfill the following equations for g to be a

Khon Kaen University                                                       August 6, 2010
natural cubic C 2-spline:

                       Si−1(ti) = Si(ti)    for i = 1, . . . , n
                       Si−1(ti) = Si(ti)    for i = 1, . . . , n               (3)
                       Si−1(ti) = Si (ti) for i = 1, . . . , n

  A direct implementation would lead to an unnecessarily large quadratic
 programming problem and we propose to use the value-second derivative
  representation (see Green and Silverman, 1994, chapter2)for the actual
                             implementation.
For i = 1, . . . , n, define gi = g(ti) and γi = g (ti). By definition, a natural
  cubic C 2-spline has γ1 = γn = 0. Let g denote the vector (g1, . . . , gn)T
 and γ = (γ2, . . . , γn−1 )T . Note that for notational simplicity later on the
 entries of γ are numbered in a non-standard way, starting at i = 2. The
       vectors g and γ specify the natural cubic spline g completely.

Khon Kaen University                                                  August 6, 2010
However, not all possible vectors g and γ represent natural cubic splines.
To derive sufficient (and necessary) conditions for g and γ to represent a
      cubic spline we define the following matrices Q and R. Define
hi = ti+1 − ti for i = 1, . . . , n − 1. Let Q be the n × (n − 2) matrix with
        entries qi,j , for i = 1, . . . , n and j = 2, . . . , n − 1, given by

           qj−1,j = h−1 ,
                     j−1        qj,j = −h−1 − h−1 ,
                                         j−1   j          and qj,j+1 = h−1 ,
                                                                        j


 for j = 2, . . . , n − 1, and qi,j = 0 for |i − j| ≥ 2. Note, that the columns
   of Q are numbered in the same non-standard way as the entries of γ.
  The (n − 2) × (n − 2) matrix R is symmetric with elements {ri,j }n−1
                                                                   i,j=2
                                given by

                         ri,i = 1 (hi−1 + hi) for i = 2, · · · , n − 1,
                                3
                       ri,i+1 = ri+1,i = 1 hi
                                         6      for i = 2, · · · , n − 2,

Khon Kaen University                                                        August 6, 2010
and ri,j = 0 for |i − j| ≥ 2. Note, that R is strictly diagonal dominant
and, hence, it follows from standard arguments in numerical linear algebra,
                      that R is strictly positive-definite.
                       We are now able to state the following key result.
 Proposition. The vectors g and γ specify a natural cubic spline g if and
                          only if the condition

                                              QT g = Rγ                              (4)

                          is satisfied. If (4) is satisfied then we have
                                        b
                                                   2
                                            {g (t)} dt = γ T Rγ.                     (5)
                                    a

              For a proof see Green and Silverman (1994, section 2.5).

Khon Kaen University                                                        August 6, 2010
This result allows us to state problem (1a) as a quadratic programming
 problem. Let y denote the (2n − 2)-vector (y1, . . . , yn, 0, . . . , 0)T , g the
                          T
(2n − 2)-vector g T , γ T , A the (2n − 2) × (n − 2)-matrix Q −RT ,
                       In the n × n unit matrix and

                                      In 0
                                D=          .                                    (6)
                                       0 λR

       Then the solution of (1a) is given by the solution of the following
                             quadratic program:

                        minimise       − yT g + 1 gT Dg,
                                                2                              (7a)
                        where         AT g = 0.                                (7b)

 We propose to use the algorithm of Goldfrab and Idnani (1982, 1983) to
                               solve (7).

Khon Kaen University                                                    August 6, 2010

Mais conteúdo relacionado

Mais de Kanda Runapongsa Saikaew

ใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัย
ใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัยใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัย
ใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัยKanda Runapongsa Saikaew
 
บริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษา
บริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษาบริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษา
บริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษาKanda Runapongsa Saikaew
 
Using Facebook as a Supplementary Tool for Teaching and Learning
Using Facebook as a Supplementary Tool for Teaching and LearningUsing Facebook as a Supplementary Tool for Teaching and Learning
Using Facebook as a Supplementary Tool for Teaching and LearningKanda Runapongsa Saikaew
 
วิธีการติดตั้งและใช้ Dropbox
วิธีการติดตั้งและใช้ Dropboxวิธีการติดตั้งและใช้ Dropbox
วิธีการติดตั้งและใช้ DropboxKanda Runapongsa Saikaew
 
Using Facebook and Google Docs for Teaching and Sharing Information
Using Facebook and Google Docs for Teaching and Sharing InformationUsing Facebook and Google Docs for Teaching and Sharing Information
Using Facebook and Google Docs for Teaching and Sharing InformationKanda Runapongsa Saikaew
 
เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้
เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้
เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้Kanda Runapongsa Saikaew
 
การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้
การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้
การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้Kanda Runapongsa Saikaew
 
คู่มือการใช้ Dropbox
คู่มือการใช้ Dropboxคู่มือการใช้ Dropbox
คู่มือการใช้ DropboxKanda Runapongsa Saikaew
 
การใช้เฟซบุ๊กเพื่อการเรียนการสอน
การใช้เฟซบุ๊กเพื่อการเรียนการสอนการใช้เฟซบุ๊กเพื่อการเรียนการสอน
การใช้เฟซบุ๊กเพื่อการเรียนการสอนKanda Runapongsa Saikaew
 
การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์
การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์
การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์Kanda Runapongsa Saikaew
 
Social Media (โซเชียลมีเดีย)
Social Media (โซเชียลมีเดีย)Social Media (โซเชียลมีเดีย)
Social Media (โซเชียลมีเดีย)Kanda Runapongsa Saikaew
 
Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)
Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)
Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)Kanda Runapongsa Saikaew
 
การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้
การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้
การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้Kanda Runapongsa Saikaew
 
การใช้งานโปรแกรม R เบื้องต้น
การใช้งานโปรแกรม R เบื้องต้นการใช้งานโปรแกรม R เบื้องต้น
การใช้งานโปรแกรม R เบื้องต้นKanda Runapongsa Saikaew
 

Mais de Kanda Runapongsa Saikaew (20)

Introduction to JSON
Introduction to JSONIntroduction to JSON
Introduction to JSON
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Android dev tips
Android dev tipsAndroid dev tips
Android dev tips
 
Introduction to Google+
Introduction to Google+Introduction to Google+
Introduction to Google+
 
ใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัย
ใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัยใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัย
ใช้ไอทีอย่างไรให้เป็นประโยชน์ เหมาะสม และปลอดภัย
 
บริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษา
บริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษาบริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษา
บริการไอทีของมหาวิทยาลัยขอนแก่นเพื่อนักศึกษา
 
Baby Health Journal
Baby Health Journal Baby Health Journal
Baby Health Journal
 
Using Facebook as a Supplementary Tool for Teaching and Learning
Using Facebook as a Supplementary Tool for Teaching and LearningUsing Facebook as a Supplementary Tool for Teaching and Learning
Using Facebook as a Supplementary Tool for Teaching and Learning
 
วิธีการติดตั้งและใช้ Dropbox
วิธีการติดตั้งและใช้ Dropboxวิธีการติดตั้งและใช้ Dropbox
วิธีการติดตั้งและใช้ Dropbox
 
Using Facebook and Google Docs for Teaching and Sharing Information
Using Facebook and Google Docs for Teaching and Sharing InformationUsing Facebook and Google Docs for Teaching and Sharing Information
Using Facebook and Google Docs for Teaching and Sharing Information
 
เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้
เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้
เครื่องมือเทคโนโลยีสารสนเทศฟรีที่น่าใช้
 
การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้
การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้
การใช้เฟซบุ๊กเพื่อแลกเปลี่ยนเรียนรู้
 
คู่มือการใช้ Dropbox
คู่มือการใช้ Dropboxคู่มือการใช้ Dropbox
คู่มือการใช้ Dropbox
 
การใช้เฟซบุ๊กเพื่อการเรียนการสอน
การใช้เฟซบุ๊กเพื่อการเรียนการสอนการใช้เฟซบุ๊กเพื่อการเรียนการสอน
การใช้เฟซบุ๊กเพื่อการเรียนการสอน
 
การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์
การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์
การใช้เฟซบุ๊กอย่างปลอดภัยและสร้างสรรค์
 
Social Media (โซเชียลมีเดีย)
Social Media (โซเชียลมีเดีย)Social Media (โซเชียลมีเดีย)
Social Media (โซเชียลมีเดีย)
 
Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)
Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)
Mobile Application for Education (โมบายแอปพลิเคชันเพื่อการศึกษา)
 
Google bigtableappengine
Google bigtableappengineGoogle bigtableappengine
Google bigtableappengine
 
การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้
การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้
การใช้โซเชียลมีเดียเพื่อช่วยในการเรียนรู้
 
การใช้งานโปรแกรม R เบื้องต้น
การใช้งานโปรแกรม R เบื้องต้นการใช้งานโปรแกรม R เบื้องต้น
การใช้งานโปรแกรม R เบื้องต้น
 

Statistical Modeling of Extreme Values

  • 1. Statistical Moeling of Extreme Values: Basic Theory and Its Implementation in Open Source Programing Environment R Nader Tajvidi Department of Mathematical Statistics Lund Institute of Technology Box 118 SE-22100 Lund Sweden August 6, 2010 Khon Kaen University
  • 2. Outline • Some examples of application of extreme value theory • Univariate extreme value distributions • Characterisation of multivariate extreme value distributions • Bivariate extreme value distributions • Parametric models for the dependence function • Parametric and nonparametric estimation of the dependence function • Monte Carlo approximations to mean integrated squared errors of parametric and nonparametric estimators • Application to Australian temperature data Khon Kaen University August 6, 2010
  • 3. Annual maximum sea levels at Port Pirie, South Australia 4.6 4.4 4.2 4.0 Sea−Level (meters) 3.8 3.6 1930 1940 1950 1960 1970 1980 Year Khon Kaen University August 6, 2010
  • 4. Breaking strengths of glass fibers Histogram of breaking strengths of glass fibers Percent of Total 30 20 10 0 0.5 1.0 1.5 2.0 Breaking Strength Density plot of breaking strengths of glass fibers 1.5 Density 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 Breaking Strength Khon Kaen University August 6, 2010
  • 5. Annual maximum sea levels at Fremantle, Western Australia 1.8 1.6 Sea−Level (meters) 1.4 1.2 1900 1920 1940 1960 1980 Year Khon Kaen University August 6, 2010
  • 6. Annual maximum sea levels at Fremantle, Western Australia, versus mean annual value of Southern Oscillation Index 1.8 1.6 Sea−Level (meters) 1.4 1.2 −1 0 1 2 SOI Khon Kaen University August 6, 2010
  • 7. Comparing Port Pirie and Fremantle datasets 4.6 4.4 4.2 4.0 3.8 Sea−Level (meters) 3.6 1930 1940 1950 1960 1970 1980 Year 1.8 1.6 1.4 Sea−Level (meters) 1.2 1900 1920 1940 1960 1980 Year Khon Kaen University August 6, 2010
  • 8. Daily closing prices of the Dow Jones Index dowjones 11000 9000 Index 7000 5000 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 1995 1996 1997 1998 1999 2000 2001 Year Khon Kaen University August 6, 2010
  • 9. Log-daily returns of the Dow Jones Index log.daily.return 0.04 0.02 0.00 −0.02 Index −0.04 −0.06 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 1995 1996 1997 1998 1999 2000 2001 Year Khon Kaen University August 6, 2010
  • 10. Dow Jones Index data dowjones log.daily.return 0.04 11000 0.02 0.00 9000 −0.02 Index Index 7000 −0.04 −0.06 5000 Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q1 1995 1999 1995 1999 Year Year Khon Kaen University August 6, 2010
  • 11. Windstorm loss data • Windstorm losses of the Swedish insurance group L¨nsf¨rs¨kringar during the period 1982 to 1993 a o a • The database contains: – The individual amounts of all claims – The place and time of the claims – The type of the claim • 46 storm events, with a total claimed amount of 510 million Swedish crowns (MSEK) • Farm insurance comprising of approximately 65% of the total amount • All values were corrected for inflation • No adjustments for portfolio changes Khon Kaen University August 6, 2010
  • 12. Windstorm losses 1982-1993 Feb 92 Dec 88 4 n8 Ja Ja 83 n 93 Jan Questions: • How can we predict the size of the next very severe storm? • How much reinsurance does a company need to buy? Khon Kaen University August 6, 2010
  • 13. Windstorm losses which exceed the level u = 0.9 MSEK, for 1982 – 1993 Jan 93 120 100 storm loss (in MSEK) 80 60 Jan 83 Jan 84 40 Dec 88 Feb 92 20 0 0 10 20 30 40 storm number Khon Kaen University August 6, 2010
  • 14. Australian temperature data • A very large dataset on annual maximum and minimum average daily temperatures at 224 stations across Australia Queensland New South Wales Victoria South Australia West Australia Northern Territory Tasmania Khon Kaen University August 6, 2010
  • 15. Annual maximum temperatures in Victoria, Australia • The maximum value, over all 34 weather stations that were operating in the state of Victoria from 1910 to 1993, of annual temperatures (in degrees Celsius) during this period. Khon Kaen University August 6, 2010
  • 16. 39 33.6 33.5 35 33.4 ˆ μ 33 33.3 Temperature 31 33.2 1920 1940 1960 1980 0.0 0.2 0.4 0.6 0.8 1.0 Year t 0.70 -0.1 0.65 -0.2 0.60 ˆ σ -0.3 ˆ γ 0.55 -0.4 0.50 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t t Khon Kaen University August 6, 2010
  • 17. Average annual maximum temperature • The average annual maximum is derived by taking the mean of maximum annual temperature readings at 224 weather stations across Australia in the period 1890–1993. Queensland New South Wales Victoria South Australia West Australia Northern Territory Tasmania Khon Kaen University August 6, 2010
  • 18. Location and scale estimates with Gaussian fit 29.0 28.5 28.5 28.0 28.0 27.5 ˆ μ 27.0 27.5 Temperature 26.5 26.0 27.0 1900 1940 1980 0.0 0.2 0.4 0.6 0.8 1.0 Year t -136.3 0.50 j 0.45 0.40 -136.5 ˆ σ 0.35 C1 0.30 0.25 -136.7 0.0 0.2 0.4 0.6 0.8 1.0 0.12 0.16 0.20 0.24 t h Khon Kaen University August 6, 2010
  • 19. Another application to Australian temperature data • maximum annual values of average daily temperature measurements at two meteorological stations, Leonora (latitude 28.53, longitude 121.19) and Menzies (latitude 29.42, longitude 121.02), in Western Australia during the period 1898–1993. 27 Menzies 26 25 24 25 26 27 28 Leonora Khon Kaen University August 6, 2010
  • 20. Annual Maximum Wind Speeds in 1944-1983 80 70 60 50 Annual Maximum Wind Speed (konts) at Hartford (CT) 40 45 50 55 60 65 Annual Maximum Wind Speed (konts) at Albany (NY) Khon Kaen University August 6, 2010
  • 21. Concurrent measurements of wave and surge height in south west England 0.8 0.6 0.4 0.2 Surge (m) 0.0 −0.2 0 2 4 6 8 10 Wave Height (m) Khon Kaen University August 6, 2010
  • 22. The framework 1. A proper mathematical model has to be chosen in each case. • parametric; best if the model is correct • non parametric; can not be used for extrapolation outside the observed values • semi parametric; very flexible (main subject of this talk) 2. Parameters in each model have to be estimated based on the historical data. Which method should be used? 3. These estimates are our “best guesses” of the process which is being analyzed. How to specify uncertainty in the estimates?. 4. Goodness of fit. Does the model give a good representation of the historical data? 5. How can we reduce the uncertainties in our models? How can extra information be incorporated in the models? Khon Kaen University August 6, 2010
  • 23. Univariate Extreme Value Distributions X1, X2, . . ., Xn, iid X ∼ F (x) Mn = max(X1, X2, . . ., Xn), n ∈ N an > 0 and bn ∈ R Mn − bn lim P ( ≤ x) = lim F n(anx + bn) = G(x) n→∞ an n→∞ G(x) non-degenerate F ∈ D(G) F (x) belongs to domain of attraction of G(x) Khon Kaen University August 6, 2010
  • 24. Type I: 0 x<0 Φα(x) = exp(−x−α) x ≥ 0 Type II: exp(−(−xα)) x < 0 Ψα(x) = 1 x≥0 Type III: Λ(x) = exp(−e−x) x∈R Generalised Extreme Value Distribution x−μ γ 1 G(x; γ, μ, σ) = exp{−(1 − γ )+ } σ Khon Kaen University August 6, 2010
  • 25. Multivariate Extreme Value Distributions (1) (d) {Xn, n ≥ 1} = {(Xn , . . . , Xn ), n ≥ 1} X ∼ F (x) iid n n (1) (d) (1) (d) Mn = (Mn , . . . , Mn ) = ( Xj , . . . , Xj ) j=1 j=1 (i) (i) σn > 0, un ∈ R P [(Mn − u(i))/σn ≤ x(i), 1 ≤ i ≤ d] = (i) n (i) F n(σn x(1) + u(1), . . . , σn x(d) + u(d)) → G(x) (1) n (d) n marginal Gi of G non-degenerate F ∈ D(G) F (x) belongs to domain of attraction of G(x) Khon Kaen University August 6, 2010
  • 26. Characterisation of Multivariate Extreme Value Distributions P [(Mn − u(i))/σn ≤ x(i), 1 ≤ i ≤ d] = (i) n (i) F n(σn x(1) + u(1), . . . , σn x(d) + u(d)) → G(x) (1) n (d) n Definition. A df G in Rd is called max-stable if for every t > 0 Gt(x) = G(α(1)(t)x(1)+β (1)(t), . . . , α(d)(t)x(d)+β (d)(t)). Definition. A df G in Rd is called max-infinitely divisible (max-id) if F t(x1, . . . , xd) is a df for every t > 0. G(∞, ∞, . . . , xi, . . . , ∞) = Φ1(xi) = exp(−x−1 ) i G∗(x) is a MEVD with Φ1 marginals Khon Kaen University August 6, 2010
  • 27. Characterisation of Max-id and Max-Stable Distributions F max-id iff for a Radon measure μ on E := [k, ∞] {k}, k ∈ [−∞, ∞)d exp{−μ[−∞, y]c} y ≥ k F (y) = 0 otherwise The measure μ is called an exponent measure. G(∞, ∞, . . . , xi, . . . , ∞) = Φ1(xi) = exp(−x−1 ) i G∗(x) is a MEVD with Φ1 marginals if for a finite measure S on ℵ = {y : y = 1} d a(i) G∗(x) = exp − (i) S(da) ℵ i=1 x a(i)S(da) = 1, 1 ≤ i ≤ d ℵ Khon Kaen University August 6, 2010
  • 28. Bivariate Extreme Value Distributions −μ∗ [0,(x,y)]c G∗(x, y) = e 1 1 x μ∗[0, (x, y)] = ( + )A( c ) x y x+y 1 A(w) = max{q(1 − w), (1 − q)w}S(dq) 0 A(w) is called dependence function. 1 1 qS(dq) = (1 − q)S(dq) = 1 0 0 • A(0) = A(1) = 1 • max{w, 1 − w} ≤ A(w) ≤ 1 • A(w) is convex for w ∈ [0, 1] Khon Kaen University August 6, 2010
  • 29. Some examples of the dependence function 1.0 0.9 0.8 A(w) 0.7 0.6 Mixed Generalised mixed 0.5 Asym. mixed 0.0 0.2 0.4 0.6 0.8 1.0 w Khon Kaen University August 6, 2010
  • 30. Parametric Models for the Dependence Function 1. The mixed model 1 1 θ μ∗([0, (x, y)] ) = + − c , 0≤θ≤1 x y x+y A(w) = θw2 − θw + 1, 0≤θ≤1 • θ = 0 gives independent case • Complete dependence is not possible 2. The logistic model μ∗([0, (x, y)]c) = (x−r + y −r )1/r , r≥1 A(w) = {(1 − w)r + wr }1/r , r≥1 • r = 1 gives independent case • r = +∞ gives complete dependence Khon Kaen University August 6, 2010
  • 31. The Generalised Symmetric Mixed Model 1 1 c 1 μ∗([0, (x, y)] ) = + − k( p )1/p, (0 ≤ k ≤ 1, p ≥ 0) x y x + yp k A(w) = 1 − 1 −p p (1 − w) + w−p • Independence for k = 0 or p = 0 • Complete dependence can be obtained with k = 1 and p = ∞ (Not possible in the symmetric or asymmetric mixed model) Khon Kaen University August 6, 2010
  • 32. The Generalised Symmetric Logistic Model 1 1 k 1 c p, μ∗([0, (x, y)] ) = ( p + p + p/2 ) (0 < k ≤ 2(p − 1), p ≥ 2) x y (xy) 1 p p p 2 A(w) = (1 − w) + wp + k ((1 − w) w) • k = 2 gives the symmetric logistic model • Independence corresponds to p = 2 and k = 2 • Complete dependence for k = 2 and p = +∞ Khon Kaen University August 6, 2010
  • 33. The Parameter Region for the Generalised Symmetric Logistic Model k logistic model 2 equivalent models 0 2 4 6 p Khon Kaen University August 6, 2010
  • 34. The Asymmetric Mixed Model c x3 + 3 x2 y − 2 φ x2 y − θ x2 y + 3 x y 2 − φ x y 2 − θ x y 2 + y 3 μ∗([0, (x, y)] ) = x y (x + y)2 A(w) = φw3 + θw2 − (θ + φ)w + 1, (θ ≥ 0, θ + 2φ ≤ 1, θ + 3φ ≥ 0) • Symmetric mixed model for φ = 0 • Independent case for θ = φ = 0 (Complete dependence is not possible) • The parameter φ stands for non-symmetry in the model Khon Kaen University August 6, 2010
  • 35. The Asymmetric Logistic Model φr xr +θ r y r 1 φr xr +θ r y r 1 (1 − φ) x + (1 − θ) y + x ( (x+y)r ) r + y ( (x+y)r ) r c μ∗([0, (x, y)] ) = xy A(w) = {(θ(1 − w))r + (φw)r }1/r + (θ − φ)w + 1 − θ, (0 ≤ θ, φ ≤ 1, r ≥ 1) • For θ = φ = 1 this model reduces to the corresponding symmetric logistic model which gives the diagonal case for r = +∞. • Independence is obtained for θ = 0 and for φ = 0 or r = 1. Khon Kaen University August 6, 2010
  • 36. Estimation of the dependence function • Nonparametric methods 1. Pickands estimator (1981) 2. Cap´ra`, Foug`res and Genest’s estimator (1997) e a e • Maximum likelihood based on parametirc models New Nonparametric Methods: 1. Convex hull of modified Pickands estimator 2. Constrained smoothing splines Khon Kaen University August 6, 2010
  • 37. Pickands estimator • Suppose (X, Y ) has a bivariate extreme value distribution with exponential margins. • min{X/(1 − w), Y /w} has an exponential distribution with mean 1/A(w). • the maximum likelihood estimator of A(w) is n −1 An(w) = n min {Xi/(1 − w), Yi/w} i=1 • For each 0 ≤ w ≤ 1, 1/An(w) is an unbiased and strongly consistent estimator of 1/A(w). • δn(w) = n1/2 1/An(w) − 1/A(w) satisfies the central limit theorem in C(0, 1), B ; see Deheuvels, P. (1991). Khon Kaen University August 6, 2010
  • 38. Pickands estimates for 100 simulated data from Logistic model with r = 1.1 and r = 1.3 1.2 1.0 1.0 A(w) A(w) 0.8 0.8 0.6 0.6 Pickands Pickands Logistic, r = 1.1 Logistic, r = 1.3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 39. Pickands estimates for 100 simulated data from Logistic model with r = 1.6 and r = 2 1.0 1.0 0.9 0.9 0.8 0.8 A(w) A(w) 0.7 0.7 0.6 0.6 Pickands Pickands 0.5 Logistic, r = 1.6 0.5 Logistic, r = 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 40. Cap´ra`, Foug`res and Genest’s e a e estimator • Copula for a bivariate extreme value distribution with marginals F (x) and G(y) C(u, v) = P {F (x) ≤ u, G(y) ≤ v} log(u) = exp log(uv)A log(uv) • Ui, Vi ≡ {F (Xi), G(Yi)}(1 ≤ i ≤ n) log(Ui) • Pseudo-observations Zi = log(UiVi ) (1 ≤ i ≤ n) • H(z) = P (Zi ≤ z) = z + z(1 − z)D(z) where D(z) = A (z)/A(z) for all 0 ≤ z ≤ 1 t H(z)−z 1 H(z)−z • A(t) = exp 0 z(1−z) dz = exp − t z(1−z) dz t Hn (z)−z 1. A0 (t) = exp n 0 z(1−z) dz Khon Kaen University August 6, 2010
  • 41. 1 Hn(z)−z 2. A1 (t) = exp − n t z(1−z) dz • log An(t) = p(t) log A0 (t) + {1 − p(t)} log A1 (t) n n definition of the estimator: Denote the ordered values of Zi by Z(1), . . . , Z(n) and define i 1/n Qi = Z(k)/(1 − Z(k)) (1 ≤ i ≤ n). k=1 Then An can be written as ⎧ ⎪ (1 − t)Q1−p(t) ⎨ n 0 ≤ t ≤ Z(1) 1−p(t) −1 An(t) = ti/n(1 − t)1−i/nQn Qi Z(i) ≤ t ≤ Z(i+1) ⎪ ⎩ −p(t) tQn Z(n) ≤ t ≤ 1 • An(0) = An(1) = 1 if p(0) = 1 − p(1) = 1. Khon Kaen University August 6, 2010
  • 42. Cap´ra`’s estimates for 100 simulated data from Logistic e a model with r = 1.1 and r = 1.3 1.0 1.0 0.9 0.9 0.8 0.8 A(w) A(w) 0.7 0.7 0.6 0.6 Caperaa Caperaa 0.5 Logistic, r = 1.1 0.5 Logistic, r = 1.3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 43. Cap´ra`’s estimates for 100 simulated data from Logistic e a model with r = 1.6 and r = 2 1.0 1.0 0.9 0.9 0.8 0.8 A(w) A(w) 0.7 0.7 0.6 0.6 Caperaa Caperaa 0.5 Logistic, r = 1.6 0.5 Logistic, r = 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 44. Modified Pickands estimator (1) (2) • Let Yi = (Yi , Yi ) for 1 ≤ i ≤ n be independent and identically extreme value distributed random variables with exponential margins. ¯ • Put Y ( ) = n−1 ( ) Yi and Yi ( ) ( )¯ = Yi /Y ( ) for i = 1, 2. (1) (2) • B(u) ≡ n−1 i=1 min Yi /(1 − u), Yi /u is n uniformly root-n consistent for B(u) ≡ A(u)−1. 1. The estimator of the dependence function passes through the points (0, 1) and (1, 1), and has gradients −1 and 1 at these respective points. ˆ 2. B(u) ≤ min{1/(1−u), 1/u} so A ≡ B −1 lies above the lower boundary of the trianglur area. ˜ ˆ 3. The greatest convex minorant, A, of A satisfies all necessary conditions for a dependence function. Khon Kaen University August 6, 2010
  • 45. Modified Pickands estimates for 100 simulated data from Logistic model with r = 1.1 and r = 1.3 1.0 1.0 0.9 0.9 0.8 0.8 A(w) A(w) 0.7 0.7 0.6 0.6 chull of modified Pickand chull of modified Pickand Modified Pickands Modified Pickands 0.5 Logistic, r = 1.1 0.5 Logistic, r = 1.3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 46. Modified Pickands estimates for 100 simulated data from Logistic model with r = 1.6 and r = 2 1.0 1.0 0.9 0.9 0.8 0.8 A(w) A(w) 0.7 0.7 0.6 0.6 chull of modified Pickand chull of modified Pickand Modified Pickands Modified Pickands 0.5 Logistic, r = 1.6 0.5 Logistic, r = 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 47. Constrained smoothing splines ˆ • A may be approximated by a spline that is constrained to satisfy all the necessary conditions on the dependence function. • Choose regularly spaced points 0 = t0 < . . . < tm = 1 in the interval [0, 1]. ˜ • Given a smoothing parameter s > 0, take As to be a polynomial smoothing spline of degree 3 or more which minimises m 1 ˆ ˜ {A(tj ) − As(tj )}2 + s ˜ As (t)2 dt , j=1 0 subject to 1. ˜ ˜ As(0) = As(1) = 1 2. ˜ ˜ As(0) ≥ −1 and As(1) ≤ 1 3. ˜ As ≥ 0 on [0, 1]. Khon Kaen University August 6, 2010
  • 48. Smoothed spline of modified Pickands estimates for 100 simulated data from Logistic model with r = 1.1 and r = 1.3 1.0 1.0 0.9 0.9 0.8 0.8 A(w) 0.7 A(w) 0.7 0.6 0.6 Smoothed spline Smoothed spline Modified Pickands Modified Pickands 0.5 Logistic, r = 1.1 0.5 Logistic, r = 1.3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 49. Smoothed spline of modified Pickands estimates for 100 simulated data from Logistic model with r = 1.6 and r=2 1.0 1.0 0.9 0.9 0.8 0.8 A(w) 0.7 A(w) 0.7 0.6 0.6 Smoothed spline Smoothed spline Modified Pickands Modified Pickands 0.5 Logistic, r = 1.6 0.5 Logistic, r = 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 w w Khon Kaen University August 6, 2010
  • 50. Which model to use in practice? • maximum likelihood of parametric models, e.g. 1. symmetric mixed model 2. symmetric logistic model 3. asymmetric mixed model 4. asymmetric logistic model 5. generalised symmetric logistic model 6. generalised asymmetric mixed model • Nonparametirc methods including 1. the Pickands (1981, 1989) estimator 2. the convex hull of Pickands’ estimator 3. the estimator proposed by Cap´ra`, Foug`res and e a e Genest (1997) 4. the convex hull of the latter 5. our modification of Pickands’ estimator 6. the convex hull of the latter • constrained smoothing splines fitted to any of these nonparametric estimators Khon Kaen University August 6, 2010
  • 51. Khon Kaen University 1.0 0.9 0.8 A(w) Smoothed spline Logistic Mixed 0.7 Generalised logistic Generalised mixed Asym. logistic Asym. mixed 0.6 Pickands chull of modified Pickand Caperaa Modified Pickands 0.5 Logistic, r = 1.1 0.0 0.2 0.4 0.6 0.8 1.0 w August 6, 2010
  • 52. Khon Kaen University 1.0 0.9 0.8 A(w) Smoothed spline Logistic Mixed 0.7 Generalised logistic Generalised mixed Asym. logistic Asym. mixed 0.6 Pickands chull of modified Pickand Caperaa Modified Pickands 0.5 Logistic, r = 1.3 0.0 0.2 0.4 0.6 0.8 1.0 w August 6, 2010
  • 53. 1.0 Khon Kaen University 0.9 0.8 A(w) Smoothed spline Logistic 0.7 Mixed Generalised logistic Generalised mixed Asym. logistic Asym. mixed 0.6 Pickands chull of modified Pickand Caperaa Modified Pickands 0.5 Logistic, r = 1.6 0.0 0.2 0.4 0.6 0.8 1.0 w August 6, 2010
  • 54. 1.0 Khon Kaen University 0.9 0.8 A(w) Smoothed spline Logistic 0.7 Mixed Generalised logistic Generalised mixed Asym. logistic Asym. mixed 0.6 Pickands chull of modified Pickand Caperaa Modified Pickands 0.5 Logistic, r = 2 0.0 0.2 0.4 0.6 0.8 1.0 w August 6, 2010
  • 55. Monte Carlo approximations to mean integrated squared errors, multiplied by 105 n = 25 n = 50 n = 100 method r=1 r=2 r=3 r=1 r=2 r=3 r=1 r=2 r=3 logistic 197 64 14 110 34 8 42 14 4 Pickands 5614 2829 3331 2034 1547 1261 1172 712 567 convex hull of Pickands 7229 2611 2775 2588 1388 1049 1430 671 477 Cap´ra` et. al. e a 889 102 35 568 49 20 307 29 10 convex hull of Cap´ra` et. al. e a 1188 95 41 666 57 25 373 32 12 modified Pickands 1351 138 33 614 77 18 366 37 11 convex hull of modified Pickands 1861 139 46 815 70 24 453 38 15 smoothed spline of Pickands 784 919 1020 396 728 487 215 334 220 convex hull of Pickands 525 769 1055 282 637 490 135 327 230 Cap´ra` et. al. e a 303 82 21 177 37 12 97 24 8 convex hull of Cap´ra` et. al. e a 286 66 22 167 39 14 97 26 11 modified Pickands 447 104 21 240 62 12 130 31 9 convex hull of modified Pickands 401 73 24 232 49 16 107 30 15 Khon Kaen University August 6, 2010
  • 56. Australian temperature data • A very large dataset on annual maximum and minimum average daily temperatures at 224 stations across Australia Queensland New South Wales Victoria South Australia West Australia Northern Territory Tasmania Khon Kaen University August 6, 2010
  • 57. Application to Australian temperature data • maximum annual values of average daily temperature measurements at two meteorological stations, Leonora (latitude 28.53, longitude 121.19) and Menzies (latitude 29.42, longitude 121.02), in Western Australia during the period 1898–1993. 27 Menzies 26 25 24 25 26 27 28 Leonora Khon Kaen University August 6, 2010
  • 58. Logistic models for the dependence function fitted by maximum likelihood to the temperature data 1.0 0.9 0.8 A(w) 0.7 0.6 Logistic Generalised logistic Asym. logistic 0.5 Modified Pickands 0.0 0.2 0.4 0.6 0.8 1.0 w Khon Kaen University August 6, 2010
  • 59. Mixed models for the dependence function fitted by maximum likelihood to the temperature data 1.0 0.9 0.8 A(w) 0.7 0.6 Mixed Generalised mixed Asym. mixed 0.5 Modified Pickands 0.0 0.2 0.4 0.6 0.8 1.0 w Khon Kaen University August 6, 2010
  • 60. Estimating a bivariate extreme-value distribution function • Let X = (X (1), X (2)) have a bivariate extreme- value distribution F . • There exist monotone increasing transformations Tj = Tj (·|θj ) such that (T1(X (1)), T2(X (2))) has distribution function G0. (1) (2) • Given a sample {Xi = (Xi , Xi ), 1 ≤ i ≤ n}, ˆ compute a root-n consistent estimator θj of θj from (j) the marginal data Xi , 1 ≤ i ≤ n. ˆ • Put Tj = Tj (·|θj ) and n ( ) ( ) ( ) Yj = T (Xj ) n−1 T (Xi ) . i=1 ˆ ˆ ˆ • F x(1), x(2)) = G0 T1 x(1) θ1 , T2 x(2) θ2 is root-n consistent for F . Khon Kaen University August 6, 2010
  • 61. Distribution function estimate and semi-infinite prediction regions corresponding to nominal levels α = 0.9, 0.95 and 0.99 (a) 28.5 28.0 0.99 fs Menzies 27.5 0.95 0.9 27.0 26.5 Menzies Leonora 27.0 27.5 28.0 28.5 29.0 29.5 30.0 Leonora Khon Kaen University August 6, 2010
  • 62. Compact bivariate prediction regions Construct compact prediction regions by profiling the estimator ˜ ∂2 ˆ ˆ fs(x) = G1 t(1), t(2) ∂x(1) ∂x(2) = ˆ ˆ ˆ ˆ T1 x(1) θ1) T2 x(2) θ2) G1 t(1), t(2) ˆ t(2) ˆ t(1) ˆ t(2) × ˜ As (1) + (1) ˜ A ˆ t + t(2)ˆ ˆ t +t ˆ ˆ ˆ (2) s t(1) + t(2) ˆ t(2) ˆ t(2) ˆ t(2) × ˜s A (1) − (1) ˜s A (1) ˆ t + t(2)ˆ t ˆ + t(2)ˆ ˆ t + t(2)ˆ ˆ ˆ t(1) t(2) ˆ t(2) + A ˜ ˆ (t ˆ (1) + t(2) )3 s ˆ ˆ t(1) + t(2) of the density, f , of X. Khon Kaen University August 6, 2010
  • 63. How to choose s • CV (s) = ˜ fs(x)2 dx − 2 n−1 n ˜ f−i,s(Xi). i=1 • CV (s) is an almost-unbiased approximation to ˜2 ˜ E(fs − 2fsf ). • The value of s that results from minimising CV (s) ˜ will asymptotically minimise E(fs − f )2. • To construct prediction regions, define ˜ R(u) ≡ x : fs(x) ≥ u , β(u) = ˜ fs(x) dx . e R(u) • Given a prediction level α, let u = uα denote the ˜ solution of β(u) = α. Then, R(˜α) is a nominal u α-level prediction region for a future value of X. Khon Kaen University August 6, 2010
  • 64. Cross-validation criterion CV (s) and spline-smoothed ˜ dependence function estimate As for s = 0.05, with the unsmoothed, modified Pickands estimate (a) -0.228 1.0 -0.230 0.9 -0.232 -0.234 0.8 A(w) CV -0.236 0.7 -0.238 -0.240 0.6 Smoothed spline -0.242 chull of modified Pickand 0.5 Modified Pickands 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0 s w Khon Kaen University August 6, 2010
  • 65. Plot of spline-smoothed density estimate ˜ fs fs Menzies Leonora Khon Kaen University August 6, 2010
  • 66. Compact bootstrap calibrated prediction regions with nominal levels α = 0.85 and 0.90 28.0 27.5 27.0 26.5 Menzies 26.0 25.5 0.85 25.0 0.90 26 27 28 29 Leonora Khon Kaen University August 6, 2010
  • 67. Bootstrap calibration ¯ ˜ ˜ • Take A = A or Aλ in ¯ 1 1 ¯ x(1) F (x) = exp − (1) + (2) A (1) . x x x + x(2) ¯ • Compute the chosen region Rα, with nominal coverage α, from the data X = {X1, . . . , Xn}. ¯ • By resampling from F conditional on X , compute a new dataset X ∗ = {X1 , . . . , Xn}, and from it ∗ ∗ ¯ ¯ calculate the analogue F ∗ of F , and then the ¯α ¯ analogue R∗ of Rα. • Let γ(α) equal the probability, conditional on the ¯ data X , that a random 2-vector drawn from F lies ¯α in R∗ . • Let a = a(α) be the solution of γ(a) = α. Then, ˆ ¯ˆ ¯ Ra(α) is the bootstrap-calibrated form of Rα. Khon Kaen University August 6, 2010
  • 68. Theoretical Properties ˆ ˜ • A and its greatest convex minorant, A, are uniformly root-n consistent for A: ˆ ˜ sup |A(u) − A(u)| + sup |A(u) − A(u)| = Op n−1/2 . 0≤u≤1 0≤u≤1 • if the distribution H of Y (1)/(Y (1) + Y (2)) has a bounded density then, for each ∈ (0, 1 ], 2 sup ˆ |A (u) − A (u)| + sup ˜ |A (u) − A (u)| = Op n−1/2 ≤u≤1− ≤u≤1− ˆ ˆ ˜ ˜ • if A has three bounded derivatives then the biases of A, A , A, A are O(n−1). Khon Kaen University August 6, 2010
  • 69. Shape constrained smoothing using smoothing splines Given data {(ti, yi)}, ti ∈ [a, b] for i = 1, . . . , n, what ˆ is the behaviour of the solution g of the following minimisation problem? n b 2 2 (m) minimise yi − g(ti ) +λ g (u) du, i=1 a (1a) where g (r)(t) ≥ 0 t ∈ [a, b]. (1b) References [1] Mammen, E. and Thomas-Agnan, C. (1999), Smoothing splines and shape restrictions, Scandinavian Journal of Statististics, 26, 239– 252. Khon Kaen University August 6, 2010
  • 70. Proposed Estimator for m = 2 and r ≤ 2 For m = 2, the piecewise polynomial representation of a natural cubic C 2-spline g is: n g(t) = I[ti,ti+1)(t)Si(t), (2a) i=0 where Si(t) = ai + bi(t − ti) + ci(t − ti)2 + di(t − ti)3, 1 ≤ i ≤ n − 1, (2b) S0(t) = a1 + b1(t − t1) and Sn(t) = Sn−1(tn) + Sn−1(tn)(t − tn). The coefficients in (2b) have to fulfill the following equations for g to be a Khon Kaen University August 6, 2010
  • 71. natural cubic C 2-spline: Si−1(ti) = Si(ti) for i = 1, . . . , n Si−1(ti) = Si(ti) for i = 1, . . . , n (3) Si−1(ti) = Si (ti) for i = 1, . . . , n A direct implementation would lead to an unnecessarily large quadratic programming problem and we propose to use the value-second derivative representation (see Green and Silverman, 1994, chapter2)for the actual implementation. For i = 1, . . . , n, define gi = g(ti) and γi = g (ti). By definition, a natural cubic C 2-spline has γ1 = γn = 0. Let g denote the vector (g1, . . . , gn)T and γ = (γ2, . . . , γn−1 )T . Note that for notational simplicity later on the entries of γ are numbered in a non-standard way, starting at i = 2. The vectors g and γ specify the natural cubic spline g completely. Khon Kaen University August 6, 2010
  • 72. However, not all possible vectors g and γ represent natural cubic splines. To derive sufficient (and necessary) conditions for g and γ to represent a cubic spline we define the following matrices Q and R. Define hi = ti+1 − ti for i = 1, . . . , n − 1. Let Q be the n × (n − 2) matrix with entries qi,j , for i = 1, . . . , n and j = 2, . . . , n − 1, given by qj−1,j = h−1 , j−1 qj,j = −h−1 − h−1 , j−1 j and qj,j+1 = h−1 , j for j = 2, . . . , n − 1, and qi,j = 0 for |i − j| ≥ 2. Note, that the columns of Q are numbered in the same non-standard way as the entries of γ. The (n − 2) × (n − 2) matrix R is symmetric with elements {ri,j }n−1 i,j=2 given by ri,i = 1 (hi−1 + hi) for i = 2, · · · , n − 1, 3 ri,i+1 = ri+1,i = 1 hi 6 for i = 2, · · · , n − 2, Khon Kaen University August 6, 2010
  • 73. and ri,j = 0 for |i − j| ≥ 2. Note, that R is strictly diagonal dominant and, hence, it follows from standard arguments in numerical linear algebra, that R is strictly positive-definite. We are now able to state the following key result. Proposition. The vectors g and γ specify a natural cubic spline g if and only if the condition QT g = Rγ (4) is satisfied. If (4) is satisfied then we have b 2 {g (t)} dt = γ T Rγ. (5) a For a proof see Green and Silverman (1994, section 2.5). Khon Kaen University August 6, 2010
  • 74. This result allows us to state problem (1a) as a quadratic programming problem. Let y denote the (2n − 2)-vector (y1, . . . , yn, 0, . . . , 0)T , g the T (2n − 2)-vector g T , γ T , A the (2n − 2) × (n − 2)-matrix Q −RT , In the n × n unit matrix and In 0 D= . (6) 0 λR Then the solution of (1a) is given by the solution of the following quadratic program: minimise − yT g + 1 gT Dg, 2 (7a) where AT g = 0. (7b) We propose to use the algorithm of Goldfrab and Idnani (1982, 1983) to solve (7). Khon Kaen University August 6, 2010