The aim of this presentation is to revise the functional regression models with scalar response (Linear, Nonlinear and Semilinear) and the extension to the more general case where the response belongs to the exponential family (binomial, poisson, gamma, ...). This extension allows to develop new functional classification methods based on this regression models. Some examples along with code implementation in R are provided during the talk. Lecturer: Manuel Febrero Bande, Univ. de Santiago de Compostela, Spain.
2. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Table of Contents
1 Linear Models
Basis representation
Principal Components
Partial Least Squares
Examples
2 Non Linear and Semi Linear Models
Non Linear
Semi Linear Model
3 Generalized Models
Generalized Linear Models
Generalized Additive Models
4 Examples
Tecator
3. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Table of Contents
1 Linear Models
Basis representation
Principal Components
Partial Least Squares
Examples
2 Non Linear and Semi Linear Models
Non Linear
Semi Linear Model
3 Generalized Models
Generalized Linear Models
Generalized Additive Models
4 Examples
Tecator
4. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Introduction
Suppose that X ∈ L2(T) and y ∈ R. Assume also that
E [X(t)] = 0, ∀t ∈ [0, T] and E [y] = 0.
The functional linear regression model states that
y = X, β + =
T
X(t)β(t)dt +
where β ∈ L2(T) and is the error term.
One way of estimating β, it is representing the parameter (and optionally
Xi ) in a L2-basis in the following way:
β(t) =
k
βk θk (t), Xi (t) =
k
ci,k ψk (t)
5. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Representation in a basis
Observed the sample {(X1, y1), . . . , (Xn, yn)}, we can approximate Xi
and β using a nite sum of basis elements:
Xi (t) =
Kx
k
cik ψk (t), β(t) =
Kβ
k
bk θk (t)
X = CΨ(t), β = θ b
where y = X, β + ≈ CΨθ b + = Zb +
b = (Z Z)−1
Z y,
ˆy = CJψθb = Zb = Z(Z Z)−1
Z y = Hy
with Jψθ = ( ψi , θj )ij . The choice of the appropiate basis becomes now
in a crucial step.
6. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Estimation of β
Fixed basis: B-spline, Wavelets, Fourier.
Ramsay and Silverman (2005), Ramsay and Silverman (2002),
Cardot (2000), Cardot et al. (2003), Antoniadis and Sapatinas
(2003) . . .
Functional Principal Components (FPC).
Silverman (1996), Cardot et al. (1999), Cardot and Sarda (2005),
Hall et al. (2006), Cardot et al. (2007), Yao and Lee (2005),. . .
Partial Least Squares (FPLS).
Preda and Saporta (2005), Krämer et al. (2008), . . .
7. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Principal components (PC)
The principal components of X are linear combinations given by the
eigenfunctions {vk }k≥1
of the covariance operator of X:
X(t) =
k
ck vk (t), ck = X, vk
where vk are the solution of the eigenvalue equation
T
Σ(t, s)vk (s)ds = λk vk (t), vk , vl = 1{k=l},
and Σ(t, s) = Cov(X(s), X(t)) ∀t, s ∈ [0, T]
As in classical multivariate setting, the process X and the set of its
principal eigenfunctions, {vk }k≥1
span the same linear space.
So, the PC's constitutes an orthonormal basis of L2.
8. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Fitted, Residuals, Leverage
Once a Functional Linear Model is estimated, then
yi = Xi , β(kn) =
kn
k=1
vik βk =
kn
k=1
vik
v·k Y
nλk
−→ Y = H(kn)Y
where H(kn) is the n × n hat matrix, given by:
H(kn) =
1
n
v·1v·1
λ1
+ · · · +
v·kn v·kn
λkn
.
So, the Cov(Y |X1, . . . , Xn) = σ2
H(kn). The leverage (0 ≤ H(kn),ii ≤ 1) is
a measure of the inuence a priori of a given observation in prediction.
As Tr H(kn) = kn, we can mark that observations (Xi , yi ) with leverage
much larger than the average (kn/n).
The residuals can now be written in matrix form:
e = Y − Y = In − H(kn) Y = v(kn+1:n)β(kn+1:n) + In − H(kn) ,
9. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Residual Variance
Using Cardot et al. (2003) and Hall et al. (2006), the term
v(kn+1:n)β(kn+1:n) can be neglected if n is large enough and kn has been
chosen suitably. Moreover, as Tr In − H(kn) = n − kn, it is not dicult
to see that:
E [e e|X1, . . . , Xn] = n
β2
kn+1
λkn+1
+ · · · +
β2
n
λn
+ (n − kn) σ2
,
which suggests that the error variance σ2
may be estimated by the
functional residual variance estimate, s2
R , given by:
s2
R =
e e
n − kn
.
10. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Measures of inuence
The functional Cook's measure for prediction
CPi =
y − y(−i,kn) y − y(−i,kn)
s2
R
,
The functional Cook's measure for estimation
CEi =
β(kn) − β(−i,kn)
2
s2
R
n
kn
k=1
1
λk
,
The functional Peña's measure for prediction
Pi =
si si
s2
R H(kn),ii
,
where si = yi − y(−1,kn),i , . . . , yi − y(−n,kn),i
11. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Example with PC's
t = seq(0, 1, length = nt - 51)
covexp = function(t1, t2) {
3 * exp(-abs(t1 - t2)/0.5)
}
Sigma = outer(t, t, covexp)
X = rproc2fdata(n - 200, t, sigma = Sigma)
plot(X)
0.0 0.2 0.4 0.6 0.8 1.0
-6-226
Gaussian process
t
X(t)
12. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Example with PC's cont'ed
res = eigen(Sigma)
pc5teo = fdata(t(res$vector[, 1:5]), argvals = t) #Theo. PC's
pc5teo[[data]] = sweep(pc5teo[[data]], 1, norm.fdata(pc5teo),
/)
res.est = fdata2pc(X, ncomp = 5) # Estimated PC's
pc5est = res.est$rotation
0.0 0.2 0.4 0.6 0.8 1.0
-1.5-0.50.51.5
Theo. PC's
X(t)
0.0 0.2 0.4 0.6 0.8 1.0
-1.5-0.50.51.5
Estimated PC's
rotation
13. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FLM with PC's I
betaf = t + log(t + 0.1)
betaf = fdata(betaf, argvals = t) #Theo. Beta
vteo = inprod.fdata(pc5teo, betaf) # Theo. Coefs
vest = inprod.fdata(pc5est, betaf) # Estim. coefs
comb.func = function(X, coefs) {
t = X$argvals
Xnew = sweep(X$data, 1, coefs, *)
Xnew = fdata(apply(Xnew, 2, sum), argvals = t, rangeval = X$rangeval,
names = X$names)
return(Xnew)
}
betapc5t = comb.func(pc5teo, vteo)
betapc5e = comb.func(pc5est, vest)
y = 4 + drop(inprod.fdata(X, betaf)) + rnorm(n, sd = 0.5) # Simulated response
res.pc = fregre.pc(X, y, l = 1:5)
14. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FLM with PC's II
0.0 0.2 0.4 0.6 0.8 1.0
-2.0-1.00.01.0
fdataobj
t
X(t)
Theor.
Oracle Theo. (5)
Oracle Est. PC(5)
Estim. from data
15. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FLM with PC's III
summary(res.pc)
*** Summary Functional Data Regression with Principal Components ***
Call:
fregre.pc(fdataobj = X, y = y, l = 1:5)
Residuals:
Min 1Q Median 3Q Max
-1.46463 -0.34188 -0.00754 0.36205 1.48351
Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 4.03876 0.03721 108.553 2e-16 ***
PC1 -0.12819 0.02836 -4.520 1.08e-05 ***
PC2 -0.84670 0.04904 -17.265 2e-16 ***
PC3 0.30974 0.08688 3.565 0.000458 ***
PC4 -0.35799 0.10170 -3.520 0.000538 ***
PC5 -0.11690 0.15306 -0.764 0.445917
---
Signif. codes:
....
16. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FLM with PC's IV
2 3 4 5
246
R-squared= 0.63
Fitted values
y
2 3 4 5
-1.50.01.5
Residuals vs fitted.values
Fitted values
Residuals
2 3 4 5
0.01.0
Scale-Location
Fitted values
Standardizedresiduals
0.02 0.04 0.06 0.08
0100200
Leverage
Leverage
Index.curves
-3 -2 -1 0 1 2 3
-1.50.01.5
Residuals
Theoretical Quantiles
SampleQuantiles
-1.50.01.5
Residuals
17. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FLM with PC's V
....
Residual standard error: 0.5262 on 194 degrees of freedom
Multiple R-squared: 0.6349, Adjusted R-squared: 0.6255
F-statistic: 67.46 on 5 and 194 DF, p-value: 2.2e-16
-With 5 Principal Components is explained 91.31 %
of the variability of explicative variables.
-Variability for each principal components -PC- (%):
PC1 PC2 PC3 PC4 PC5
58.79 19.68 6.26 4.57 2.02
-Names of possible atypical curves: No atypical curves
-Names of possible influence curves:
18. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FLM with PC's VI
2 3 4 5
246
R-squared= 0.63
Fitted values
y
2 3 4 5
-1.50.01.5
Residuals vs fitted.values
Fitted values
Residuals
2 3 4 5
0.01.0
Scale-Location
Fitted values
Standardizedresiduals
0.02 0.04 0.06 0.08
0100200
Leverage
Leverage
Index.curves
-3 -2 -1 0 1 2 3
-1.50.01.5
Residuals
Theoretical Quantiles
SampleQuantiles
-1.50.01.5
Residuals
19. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Choice of kn I
To avoid a perfect t, Cardot et al. (1999) proposed to estimate β by
taking βk = 0, for k ≥ kn + 1, with 0 kn n and λkn
0, and
minimizing the residual sum of squares given by:
RSS β(1:kn) =
n
i=1
yi −
kn
k=1
cik βk
2
= Y − c(1:kn)β(1:kn)
2
,
where Y = (y1, . . . , yn) , β(1:kn) = (β1, . . . , βkn
) and c(1:kn) is the n × kn
matrix whose k-th column is the vector c·k = (c1k , . . . , cnk ) , the k-th
principal component score, which veries c·k c·k = nλk and c·k c·l = 0, for
k = l. So,
β(1:kn) =
c·1
Y
nλ1
, . . . ,
c·kn
Y
nλkn
, β(kn) =
kn
k=1
βk vk =
kn
k=1
c·k Y
nλk
vk .
20. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Choice of kn II
The optimal kn should be chosen taking into account the work by Hall
et al. (2006) that establishes:
Hall et al. (2006)
E β − β(kn)
2
|X =
σ2
n
kn
k=1
1
λk
+
∞
k=kn+1
β, vk
2
Predictive Cross-Validation:
PCV (k) = 1
n
n
i=1
yi − Xi , β(−i,k)
2
,
Model Selection Criteria:
MSC (k) = log 1
n
n
i=1
yi − Xi , β(k)
2
+ pn
k
n ,
pn = 2 (AIC),
pn = 2n/(n − k − 2) (AICc),
pn = log(n)/n (SIC)
21. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Example
res.pc3 = fregre.pc(X, y, l = 1:3)
res.pc7 = fregre.pc(X, y, l = 1:7)
basis.x = create.bspline.basis(c(0, 1), nbasis = 21)
basis.b5 = create.bspline.basis(c(0, 1), nbasis = 5)
basis.b7 = create.bspline.basis(c(0, 1), nbasis = 11)
res.basis5 = fregre.basis(X, y, basis.x = basis.x, basis.b = basis.b5)
res.basis7 = fregre.basis(X, y, basis.x = basis.x, basis.b = basis.b7)
0.0 0.2 0.4 0.6 0.8 1.0
-2.0-1.00.01.0
PC's-Basis Example
X(t)
Beta
PC(3)
PC(7)
Spl(5)
Spl(11)
22. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
PC Ridge Regression
Cardot et al. (2007) have proposed to modied the estimation of β in
order to solve its stability when some terms corresponding to small
eigenvalues are added to the model.
βRR
(kn) =
kn
k=1
Cov(ˆc·k , y)
ˆλk + rn
ˆvk .
where rn 0 (ridge parameter).
E β − βRR
(kn)
2
|X =
σ2
n
kn
k=1
ˆλk
ˆλk + rn
2
+ r2
n
kn
k=1
β, ˆvk
2
ˆλk + rn
2
+
+
∞
k=kn+1
β, ˆvk
2
23. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Partial Least Squares (PLS) [Preda and Saporta (2005)]
The basis idea of PLS approach is to construct a set of orthogonal
random variables {νi }i≥1
in the linear space spanned by X taking into
account the covariance between Y and X.
The PLS components are obtained in the following iterative way:
1 Dene y0 = y − ¯y and X0 = X − ¯X and let l = 0
2 Let tl+1 = Xl , wl+1 , where wl+1 ∈ L2 such that Cov(yl , tl+1)
2
is
maximal. Then wl+1 = Cov(yl , Xl ) / ||Cov(yl , Xl )||
3 Let yl+1 = yl − ul+1tl+1 where ul+1 = Cov(yl , tl+1) /Var[tl+1] and
Xl+1 = Xl − νl+1tl+1 where νl+1 = Cov(Xl , tl+1) /Var[tl+1]
4 Let l = l + 1 and back to step 2.
Finally, X = ¯X + l tl νl and y = ¯y + l ul tl + e
24. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
MV PLS estimation I
Let X = (Xi (τj )) the (n × T) matrix with the evaluations of functional
data at the discretization points {τj }
T
j=1
and y the response vector
(n × p).
1 Select a weight non-zero vector w of length T (for example a row of
X or the PC1) and normalize it.
2 Compute a score vector t = Xw, t is (n × 1)
3 Compute a y-loading vector q = y t, q is (p × 1)
4 Compute a y-score vector u = yq, u is (n × 1)
5 Compute a new weight vector w1 = X u and normalize it.
6 If ||w − w1|| the convergence is obtained, otherwise w = w1 and
go to step 2.
The pair (t, u) are the scores, respectively, for X and y.
These six steps can be summarized obtaining the rst eigenvector of
the matrices X YY X and XX YY .
The components (p, b) for X and y are computed in the following
way:
25. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
MV PLS estimation II
7 Compute the loading vector p = X t/(t t)
8 Deact X computing X1 = X − tp
9 Compute regression of Y onto t: b = y t/(t t)
10 Adjust y using b: y1 = y − tb
11 If more are needed then set X = X1 and y = y1 and go to 1.
26. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Best selection of components I
res.pc.cv = fregre.pc.cv(X, y, 5)
res.pc.cv2 = fregre.pc.cv(X, y, 5, rn = seq(0, 0.5, len = 11),
criteria = CV)
res.basis.cv = fregre.basis.cv(X, y, basis.x = 13:17, basis.b = 5:11)
res.pls.cv = fregre.pls.cv(X, y, 4, criteria = CV)
Opt. PC: 2 1 4
PCRR: 2 1 4 3 -
Basis X 13 Basis B: 5
PLS 1
27. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Best selection of components II
0.0 0.2 0.4 0.6 0.8 1.0
-2.0-1.00.00.51.0
Beta
t
X(t)
Beta
PC
PCRR
PLS
Spl
28. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Prediction
PC
r^2: 0.655
s^2: 0.236
3 4 5 6
2345
3456
PLS
r^2: 0.548
s^2: 0.306
2 3 4 5 2 3 4 5
2345
B-Spline
r^2: 0.665
s^2: 0.231
29. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Remarks on FLM
Penalized versions of PC or PLS can also be applied simply
substituting {X}
n
i=1
by ˜X
n
i=1
with ˜Xi = (I + λP)
−1
Xi and P a
penalization matrix.
Bootstrap methods can be adapted to test or study dierent aspects
of the FLM
res.boot = fregre.bootstrap(res.pc3, nb = 500, wild = FALSE)
lines(betaf, lwd = 2)
0.0 0.2 0.4 0.6 0.8 1.0
-2.0-1.00.01.0
beta.est bootstrap
X(t)
30. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Bootstrap on Regression Models I
Fit the funcional linear model to the dataset and obtain ˆβ, ˆyi , ˆei , . . ..
Consider the statistic ˆθ you want to replicate
Depends on model and it is homoskedastic (β, r2
, s2
R ,...) ⇒ Obtain
B standard bootstrap samples of size n from the dataset of sample
curves (denoted by Xb
1 , . . . , Xb
n where Xb
i = Xi∗ ). Optional
Smooth the bootstrap samples of both sets of curves and residuals.
Obtain Xb
i = Xb
i + Zb
i where Zb
i is a Gaussian process with zero
mean and covariance operator γX ΓX , (0 ≤ γX ≤ 1)
Depends on model and on i-element or it is heteroskedastic
(ˆyi , IFi , . . .) ⇒ Fix Xb
i = Xi
Obtain B standard bootstrap samples of size n from the residuals
(denoted by eb
= eb
1
, . . . , eb
n ).
Homoskedasticity. Naive boostrap (eb
i = ei∗ ) or Smoothed bootstrap
(eb
i = eb
i + zb
i , where zb
i is normally distributed with mean 0 and
variance γe s2
R , (0 ≤ γe ≤ 1).)
31. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Bootstrap on Regression Models II
Heteroskedasticity. Wild Bootstrap. eb
i = f (ˆei )v∗
i with
f (ˆei ) =
ˆei
n
n−kn
Opt1
ˆei /
√
1 − hii Opt2
ˆei /(1 − hii ) Opt3
and
v∗
i =
−(
√
5 − 1)/2 with prob. (
√
5 + 1)/2
√
5
−(
√
5 + 1)/2 with prob. (
√
5 − 1)/2
√
5
(Golden rule).
Let ˆθb
B
b=1
the statistic associated for each bootstrap dataset
The nal estimated is:
Condence Interval: Consider the (1 − α)-quantile (c1−α) of
ˆθb − ˆθ
B
b=1
and dene IC(1 − α) = θ : θ − ˆθ ≤ c1−α
Hypothesis testing: pˆθ = B
b=1 1 ˆθb
≤ ˆθ /B
32. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator dataset
850 900 950 1000 1050
2.02.53.03.54.04.55.05.5
Spectrometric curves
Wavelength (mm)
Absorbances
850 900 950 1000 1050
−0.02−0.010.000.010.020.030.040.05
Spectrometric curves
Wavelength (mm)
d(Absorbances,1)
850 900 950 1000 1050
−0.004−0.0020.0000.0020.004
Spectrometric curves
Wavelength (mm)
d(Absorbances,2)
Figure : Tecator example. From left to right: Absorbances, rst and second
derivative coloured by the content of fat (blue=low, red=high)
33. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator example
data(tecator)
ab = tecator$absorp.fdata
ab2 = fdata.deriv(ab, 2)
dataf = as.data.frame(tecator$y) # Fat, Protein, Water
tt = ab[[argvals]]
b.pc0 = create.pc.basis(ab, 1:4)
b.pc2 = create.pc.basis(ab2, 1:4)
basis.x = list(ab = b.pc0, ab2 = b.pc2)
f = Fat ~ ab + ab2
ldata = list(df = dataf, ab = ab, ab2 = ab2)
res = fregre.lm(f, ldata, basis.x = basis.x)
34. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator results
summary(res)
Call:
lm(formula = pf, data = XX, x = TRUE)
Residuals:
Min 1Q Median 3Q Max
-10.8067 -1.9219 0.2561 1.8306 9.0273
Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 18.14233 0.20772 87.342 2e-16 ***
ab.PC1 0.15511 0.08402 1.846 0.06633 .
ab.PC2 4.70801 1.52557 3.086 0.00231 **
ab.PC3 -13.37410 4.58308 -2.918 0.00391 **
ab.PC4 0.26779 2.46191 0.109 0.91349
ab2.PC1 3437.06617 386.05052 8.903 2.85e-16 ***
ab2.PC2 2688.52106 1525.50024 1.762 0.07949 .
ab2.PC3 932.68030 432.69736 2.156 0.03228 *
ab2.PC4 628.03681 767.97070 0.818 0.41442
---
Signif. codes:
35. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator results II
summary(res)
....
ab2.PC4 628.03681 767.97070 0.818 0.41442
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.046 on 206 degrees of freedom
Multiple R-squared: 0.945, Adjusted R-squared: 0.9428
....
36. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator Diagnosis I
10 20 30 40 50 60
-10-50510
Fitted values
Residuals
Residuals vs Fitted
43
44
7
-3 -2 -1 0 1 2 3
-4-2024
Theoretical Quantiles
Standardizedresiduals
Normal Q-Q
43
7
44
10 20 30 40 50 60
0.00.51.01.52.0
Fitted values
Standardizedresiduals
Scale-Location
43
7
44
37. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator Diagnosis II
850 900 950 1000 1050
2.03.55.0
Spectrometric curves
Wavelength (mm)
Absorbances
850 900 950 1000 1050
-0.0040.002
Spectrometric curves
Wavelength (mm)
d(Absorbances,2)
850 900 950 1000 1050
-1.50.01.5
Beta ab, r^2: 0.218
t
rotation
850 900 950 1000 1050
-100001000
Beta ab2, r^2: 0.707
t
rotation
38. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Table of Contents
1 Linear Models
Basis representation
Principal Components
Partial Least Squares
Examples
2 Non Linear and Semi Linear Models
Non Linear
Semi Linear Model
3 Generalized Models
Generalized Linear Models
Generalized Additive Models
4 Examples
Tecator
39. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Non Linear Model [Ferraty and Vieu (2006)]
Suppose (X, y) are a pair of r.v. with y ∈ R and X ∈ E where E is a
semi-metric space. To predict the response Y with X, the natural
estimator is the conditional expectation:
m(X) = E(Y |X = X),
where the NW estimator is given by:
ˆm(X) =
n
i=1
Yi K(h−1
d(X, Xi ))
n
i=1
K(h−1
d(X, Xi ))
,
where K is a asymmetric kernel function and h is the bandwidth
parameter.
Cross-Validation hopt = arg min CV (h)
CV (h) =
n
i=1
yi − ˆm(−i)(Xi )
2
or any of the GCV methods (MSC).
40. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Conditional distribution
Another alternative is to use the cumulative conditional distribution
FY |X=X (y) = FX
Y (y) = P(Y ≤ y|X = X)
and computing from this, for example, the median or the quantiles
med(X) = inf {y ∈ R, FY |X=X (y) ≥ 1/2}
tα(X) = inf {y ∈ R, FY |X=X (y) ≥ α}
41. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Asymptotics
Conditions for regression function:
m : E → R, lim
d(X ,X)→0
m(X ) = m(X),
m : E → R, |m (X) − m (X ) | Cd (X , X)
β
Conditions for conditional distributions
F : E × R → R, lim
d(X ,X)→0
FX
Y (y) = FX
Y (y), lim
d(y ,y)→0
FX
Y (y ) = FX
Y (y)
F : E × R → R, |FX
Y (y ) − FX
Y (y)| C d (X , X)
β
+ d (y , y)
β
Indeed, the small ball probability condition is needed
P(X ∈ B(X, )) = ϕX ( ) 0 and the existence of conditional moments
greater than 2.
42. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Semi Linear Model [Aneiros-Pérez and Vieu (2006)]
Let (X, Z, y) with y ∈ R (response), X ∈ E (functional) and Z ∈ Rp
(MV covariates).
y = Zβ + m(X) +
The parameters of the model are estimated by:
ˆβh = ˜Zt
h
˜Zh
−1
˜Zt
h ˜yh,
˜mh(X) =
n
i=1
Wnh(X, Xi )(yi − Zt
i
ˆβh)
where
˜Zh = (I − Wh) Z, ˜yh = (I − Wh) y, Wh = Wnh (Xi , Xj )ij ,
Wnh (X, Xi ) = K(d(X,Xi )/h
n
j=1 K(d(X,Xj )/h
43. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Diagnosis, Residuals, Inuence
Fitted values: yi = HX Y where HX is the projection or smoothing
matrix (n × n)
Residuals: e = (I − HX )Y
Eq. degrees of freedom: df (H) = tr(H)
Cov(Y |X1, . . . , Xn) = σ2
HX .
Residual variance: s2
R = e e
n−df (HX ) .
Inuence: (0 ≤ HX,ii ≤ 1).
So, we can label those observations (Xi , yi ) with more inuence than
the average (3df (HX )/n).
44. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator example I
fat = tecator$y$Fat
res.np = fregre.np(ab2, fat, h = 5e-04)
summary(res.np)
*** Summary Functional Non-linear Model ***
-Call: fregre.np(fdataobj = ab2, y = fat, h = 5e-04)
-Bandwidth (h): 5e-04
-R squared: 0.9928937
-Residual variance: 1.626762 on 151.737 degrees of freedom
-Names of possible atypical curves: No atypical curves
-Names of possible influence curves: 5 6 7 10 11 31 33 34 35 43
It prints only the 10 most influence curves
45. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Tecator example II
10 20 30 40 50
01020304050
R-squared= 0.99
Fitted values
y
10 20 30 40 50
-3-1123
Residuals vs fitted.values
Fitted values
Residuals
10 20 30 40 50
0.00.51.01.5
Scale-Location
Fitted values
Standardizedresiduals
0.0 0.2 0.4 0.6 0.8 1.0
050100200
Leverage
Leverage
Index.curves
5671011
3133 3435
43
99
122
131132
140143
171174175
183
-3 -2 -1 0 1 2 3
-3-1123
Residuals
Theoretical Quantiles
SampleQuantiles
-3-1123
Residuals
46. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Table of Contents
1 Linear Models
Basis representation
Principal Components
Partial Least Squares
Examples
2 Non Linear and Semi Linear Models
Non Linear
Semi Linear Model
3 Generalized Models
Generalized Linear Models
Generalized Additive Models
4 Examples
Tecator
47. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Generalized Linear Models
Let y belonging to a Exponential Family PDF:
f (y; θ, τ) = h(y; τ) exp
b(θ)T(y) − A(θ)
d(τ)
where h(y; τ), b(θ), T(y), A(θ) and d(τ) are known. In this case,
E(Y ) = µ = A (θ) and Var(Y ) = A (θ)d(τ).
y is related with a covariate X(X) through a linear predictor η = Xβ
( X, β ) and a link function g such that E(y) = µ = g−1
(η).
Distribution Link Function Mean Variance
Normal Identity: η = µ µ = η 1
Binomial Logit: η = ln( µ
1−µ ) µ = 1
1+exp(−η) µ(1 − µ)
Poisson Log: η = ln(µ) µ = exp(η) µ
Gamma Inverse: η = 1/µ µ = 1/η µ2
48. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Estimation of η
Typically, to estimate η, project X and β onto a nite number of
elements of a functional basis:
η = X, β ≈
pX
i=1
pβ
j=1
xi φi , ψj βj = xT
Jβ
with X(t) =
pX
i=1
xi φi (t) and β(t) =
pβ
j=1
βj ψj (t)
Fixed basis: B-spline, Wavelets, Fourier.
James (2002), . . .
Functional Principal Components (FPC).
Cardot and Sarda (2005); Escabias et al. (2004, 2005); Müller and
Stadtmüller (2005),. . .
Partial Least Squares (FPLS).
Preda and Saporta (2005), Escabias et al. (2007). . .
49. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Estimation of Generalized Linear Models
Iterated Reweighted Least Squares (IRLS)
Let ˆη0 = Xˆβ0 ( X, ˆβ0 ) the initial or current estimate of the linear
predictor with tted value ˆµ0 = g−1
(ˆη0)
Form the adjusted dependent variate z0 = ˆη0 + (y − ˆµ0)g (ˆµ0)
Dene the weights W0 = 1/(Var[ˆµ0] g (ˆµ0)2
)
Regress z0 on the covariates X with weights W0 to obtain new
estimates ˆβ0, (ˆη0, ˆµ0)
Repeat until changes in parameters and/or deviance are small
50. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Generalized Additive Models (MV)
As in GLM, the response variable y is estimated through a sum of
smooth functions of the covariates X and a g link function.
E(y) = µ = g−1
(β0 +
K
j=1
fj (Xj ))
with Xj the columns of X and E(fj (Xj )) = 0
ESTIMATION: IRLS mixed with BACKFITTING steps
Let ˆη0 = ˆβ0 +
K
j=1
ˆfj (Xj ), the initial or current estimate of the
linear predictor with tted value ˆµ0 = g−1
(ˆη0)
Form the adjusted dependent variate z0 = ˆη0 + (y − ˆµ0)g (ˆµ0)
Dene the weights W0 = 1/(V (ˆµ0)g (ˆµ0)2
)
Regress using Backtting steps z0 on the covariates X with weights
W0
Repeat until changes in functions and/or deviance are small
51. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Functional Spectral Additive Models Müller and Yao (2008)
Consider the PC representation of X
X(t) = µ(t) +
k
xk vk (t)
where vk (t) is the k eigenfunction and xk the scores. Then, the
Functional Spectral Additive Model is dened as:
Y = β0 +
K
k=1
fk (xk ) +
with with E( ) = 0, Var[ ] = σ2
and E(fk (xk )) = 0, ∀k = 1, 2, . . . , K
52. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Functional Generalized Spectral Additive Models
Consider (again) the PC representation of X (or other representation)
X(t) = µ(t) +
k
xk vk (t)
where vk (t) is the k eigenfunction and xk the scores.
Then, the Functional Generalized Spectral Additive Model is dened to
verify:
E(y) = g−1
β0 +
K
k=1
fk (xk )
with E(fk (xk )) = 0, ∀k = 1, 2, . . . , K
53. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Functional Generalized Kernel Additive Models
Febrero-Bande and González-Manteiga (2013)
Given several functional variables (X1
, d1), . . . , (Xp
, dp) (dj is a
semi-metric)
Then, the Functional Generalized Kernel Additive Model is dened to
verify:
E(y) = µ = g−1
β0 +
K
k=1
fk (Xk
)
with E(fk (Xk
)) = 0, ∀k = 1, 2, . . . , p
In the backtting step, the functional non parametric method is used
ˆfk (Xk
0
) =
N
i=1
yi − ˆβ0 − j=k
ˆfj (Xj
i ) K dk (Xk
0
, Xk
i )/hk
N
j=1
K dk (Xk
0
, Xk
j )/hk
54. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Practical considerations
Our model only uses distances between data → Other spaces than
L2.
How to avoid concurvity in FDA? The Distance Correlation proposed
by Székely et al. (2007) works although is not yet proved for FDA.
Avoiding overtting. Control the global amount of smoothing at
each step. GCV.
Convergence. Using Buja et al. (1989), the global convergence is
ensured and also oracle property.
Boundary eect in FDA is closely related to small ball probabilities.
Are your data closely surrounded with your chosen semi-metrics?
55. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Table of Contents
1 Linear Models
Basis representation
Principal Components
Partial Least Squares
Examples
2 Non Linear and Semi Linear Models
Non Linear
Semi Linear Model
3 Generalized Models
Generalized Linear Models
Generalized Additive Models
4 Examples
Tecator
56. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
fda.usc Febrero-Bande and Oviedo de la Fuente (2012)
Let fat,ab,ab1 and ab2, the response and the covariates.
ldata=list(df=data.frame(fat=fat),
ab=ab,ab1=ab1,ab2=ab2)
b.pc0=create.pc.basis(ab,1:4)
b.pc1=create.pc.basis(ab1,1:4)
b.pc2=create.pc.basis(ab2,1:4)
basis.x=list(ab=b.pc0,ab1=b.pc1,ab2=b.pc2)
Correlation Distances Székely et al. (2007)
R d2(fat) d2(X) d2(X ) d2(X )
d2(fat) 1.000 0.454 0.886 0.956
d2(X) 0.454 1.000 0.669 0.497
d2(X ) 0.886 0.669 1.000 0.930
d2(X ) 0.956 0.497 0.930 1.000
57. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
FGLM
res.glm=fregre.glm(fat∼ab+ab2,data=ldata, basis.x=basis.x)
Estimate Std. Error t value Pr( |t|)
(Intercept) 18.14233 0.20772 87.342 2e-16 ***
ab.PC1 0.15511 0.08402 1.846 0.06633 .
ab.PC2 4.70801 1.52557 3.086 0.00231 **
ab.PC3 -13.37410 4.58308 -2.918 0.00391 **
ab.PC4 0.26779 2.46191 0.109 0.91349
ab2.PC1 3437.06617 386.05052 8.903 2.85e-16 ***
ab2.PC2 2688.52106 1525.50024 1.762 0.07949 .
ab2.PC3 932.68030 432.69736 2.156 0.03228 *
ab2.PC4 628.03681 767.97070 0.818 0.41442
Residual standard error: 3.046 on 206 d.f.
Multiple R-squared: 0.945, Adjusted R-squared: 0.9428
F-statistic: 442.3 on 8 and 206 DF, p-value: 2.2e-16
cor(fat, β1, ab )2 = 21.8%, cor(fat, β2, ab2 )2 = 70.7%
58. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
850 900 950 1000 1050
2.03.04.05.0
Spectrometric curves
Wavelength (mm)
Absorbances
850 900 950 1000 1050
−0.0040.0000.004
Spectrometric curves
Wavelength (mm)
d(Absorbances,2)850 900 950 1000 1050
−1.50.01.0
beta.est
t
rotation
850 900 950 1000 1050
−10000500
beta.est
t
rotation
Figure : Tecator example. Estimation of beta parameters
59. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
GSAM
res.gsam=fregre.gsam(fat∼s(ab)+s(ab2),data=ldata,
basis.x=basis.x)
Parametric coecients: Estimate Std. Error t value Pr( |t|)
(Intercept) 18.14233 0.05041 359.9 2e-16 ***
Approximate signicance of smooth terms
edf Ref.df F p-value
s(ab.PC1) 5.548 6.654 4.696 0.000111 ***
s(ab.PC2) 1.000 1.000 27.491 4.40e-07 ***
s(ab.PC3) 1.980 2.536 17.891 8.23e-09 ***
s(ab.PC4) 7.127 8.126 4.471 5.38e-05 ***
s(ab2.PC1) 7.115 8.110 242.865 2e-16 ***
s(ab2.PC2) 7.381 8.305 5.004 1.03e-05 ***
s(ab2.PC3) 8.276 8.797 5.052 5.61e-06 ***
s(ab2.PC4) 5.986 7.130 7.532 4.52e-08 ***
R-sq.(adj) = 0.997 Deviance explained = 99.7%
GCV score = 0.6927 Scale est. = 0.54638 n = 215
cor(fat, K
k=1 fk (xab
k ))2 = 35.2%, cor(fat, K
k=1 fk (xab2
k ))2 = 89.6%
60. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
GKAM
res.gkam=fregre.gkam(fat∼s(ab)+s(ab2),data=ldata)
alpha= 18.2 n= 215 Converged? Yes Iterations:4
Smoothed terms
h cor(f(X),eta) edf
f(ab2) 0.000371 1.000 88.7
f(ab) 9.410000 0.409 1.6
Residual deviance= 116.361 Null deviance= 34735.44
AIC= 662.88 Deviance explained= 99.7 %
R-sq.= 0.997 R-sq.(adj)= 0.994
cor(fat, f1(ab))2
= 16.9%, cor(fat, f2(ab2))2
= 99.6%
63. Linear Models Non Linear and Semi Linear Models Generalized Models Examples
Bernouilli response: I(Fat≥ 15%)).
165 random observations as Training set (50 for testing)
Method Sample Min. 1st. Qu. Median Mean 3rd. Qu. Max.
GLM Train. 100% 100% 100% 100% 100% 100%
Test 88.0% 96.0% 98.0% 97.5% 98.0% 100%
GSAM Train. 100.0% 100.0% 100.0% 100% 100% 100%
Test 54.0% 92.0% 94.0% 93.8% 98.0% 100%
GKAM Train. 97.58% 98.18% 98.8% 98.7% 98.8% 100%
Test. 90.0% 96.0% 98.0% 97.9% 100.0% 100%
Table : Statistics for percentage of good classication in 500 replications.
65. References References
References I
Aneiros-Pérez, G. and Vieu, P. (2006). Semi-functional partial linear regression.
Statistics Probability Letters, 76(11):11021110.
Antoniadis, A. and Sapatinas, T. (2003). Wavelet methods for continuous-time
prediction using hilbert-valued autoregressive processes. Journal of Multivariate
Analysis, 87(1):133158.
Buja, A., Hastie, T., and Tibshirani, R. (1989). Linear smoothers and additive models.
The Annals of Statistics, 17(2):pp. 453510.
Cardot, H. (2000). Nonparametric estimation of smoothed principal components
analysis of sampled noisy functions. Journal of Nonparametric Statistics,
12(4):503538.
Cardot, H., Ferraty, F., and Sarda, P. (1999). Functional linear model. Statistics
Probability Letters, 45(1):1122.
Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functional
linear model. Statistica Sinica, 13(3):571592.
Cardot, H., Mas, A., and Sarda, P. (2007). Clt in functional linear regression models.
Probability Theory and Related Fields, 138(3):325361.
Cardot, H. and Sarda, P. (2005). Estimation in generalized linear models for functional
data via penalized likelihood. Journal of Multivariate Analysis, 92(1):2441.
66. References References
References II
Escabias, M., Aguilera, A., and Valderrama, M. (2004). Principal component
estimation of functional logistic regression: discussion of two dierent approaches.
Journal of Nonparametric Statistics, 16(3-4):365384.
Escabias, M., Aguilera, A., and Valderrama, M. (2005). Modeling environmental data
by functional principal component logistic regression. Environmetrics, 16(1):95107.
Escabias, M., Aguilera, A., and Valderrama, M. (2007). Functional pls logit regression
model. Computational Statistics Data Analysis, 51(10):48914902.
Febrero-Bande, M. and González-Manteiga, W. (2013). Generalized additive models
for functional data. TEST, pages 115. 10.1007/s11749-012-0308-0.
Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). fda.usc: Functional Data
Analysis. Utilities for Statistical Computing. R package version 1.0.0.
Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis: theory and
practice. Springer.
Hall, P., Müller, H., and Wang, J. (2006). Properties of principal component methods
for functional and longitudinal data analysis. The Annals of Statistics,
34(3):14931517.
James, G. (2002). Generalized linear models with functional predictors. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 64(3):411432.
Krämer, N., Boulesteix, A., and Tutz, G. (2008). Penalized partial least squares with
applications to b-spline transformations and functional data. Chemometrics and
Intelligent Laboratory Systems, 94(1):6069.
67. References References
References III
Müller, H. and Stadtmüller, U. (2005). Generalized functional linear models. The
Annals of Statistics, 33(2):774805.
Müller, H. and Yao, F. (2008). Functional additive models. Journal of the American
Statistical Association, 103(484):15341544.
Preda, C. and Saporta, G. (2005). Pls regression on a stochastic process.
Computational Statistics Data Analysis, 48(1):149158.
Ramsay, J. and Silverman, B. (2002). Applied functional data analysis: methods and
case studies, volume 77. Springer New York:.
Ramsay, J. and Silverman, B. (2005). Functional Data Analysis. Springer.
Silverman, B. (1996). Smoothed functional principal components analysis by choice of
norm. The Annals of Statistics, 24(1):124.
Székely, G., Rizzo, M., and Bakirov, N. (2007). Measuring and testing dependence by
correlation of distances. The Annals of Statistics, 35(6):27692794.
Yao, F. and Lee, T. (2005). Penalized spline models for functional principal
component analysis. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 68(1):325.