Mais conteúdo relacionado Semelhante a So a webinar-2013-2 (20) Mais de Arthur Charpentier (20) So a webinar-2013-21. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Perspectives of Predictive Modeling
Arthur Charpentier
charpentier.arthur@uqam.ca
http ://freakonometrics.hypotheses.org/
(SOA Webcast, November 2013)
1
2. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Agenda
•
◦
◦
•
◦
◦
•
•
◦
◦
◦
•
Introduction to Predictive Modeling
Prediction, best estimate, expected value and confidence interval
Parametric versus nonparametric models
Linear Models and (Ordinary) Least Squares
From least squares to the Gaussian model
Smoothing continuous covariates
From Linear Models to G.L.M.
Modeling a TRUE-FALSE variable
The logistic regression
R.O.C. curve
Classification tree (and random forests)
From individual to functional data
2
3. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
0.015
Prediction ? Best estimate ?
E.g. predicting someone’s weight (Y )
−→
q
qqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqq q
qqqqqq qq qqqq
q
qq
qqq
q
0.010
Consider a sample {y1 , · · · , yn }
Density
Model : Yi = β0 + εi
with E(ε) = 0 and Var(ε) = σ 2
0.005
ε is some unpredictable noise
0.000
Y = y is our ‘best guess’...
100
150
200
250
Weight (lbs.)
Predicting means estimating E(Y ).
Recall that E(Y ) = argmin{ Y − y
y∈R
ε
L2 }
= argmin{E [Y − y]2 }
y∈R
least squares
3
4. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Best estimate with some confidence
0.015
E.g. predicting someone’s weight (Y )
Give an interval [y− , y+ ] such that
q
qq
qqq
q
we can estimate the distribution of Y
dF (x)
F (y) = P(Y ≤ y) or f (y) =
dx x=y
0.000
Confidence intervals can be derived if
0.005
Density
0.010
P(Y ∈ [y− , y+ ]) = 1 − α
qqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqq q
qqqqqq qq qqqq
q
100
150
200
250
Weight (lbs.)
(related to the idea of “quantifying uncertainty” in our prediction...)
4
5. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Parametric inference
0.015
E.g. predicting someone’s weight (Y )
Assume that F ∈ F = {Fθ , θ ∈ Θ}
0.010
1. Provide an estimate θ
Density
2. Compute bound estimates
y+ =
θ
F −1 (1
θ
− α/2)
0.005
y− = F −1 (α/2)
Standard estimation technique :
0.000
−→ maximum likelihood techniques
90%
100
150
200
250
Weight (lbs.)
n
θ = argmax
θ∈Θ
log fθ (yi )
i=1
explicit (analytical) expression for θ
numerical optimization (Newton Raphson)
log likelihood
5
6. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Non-parametric inference
natural estimator for P(Y ≤ y)
qq
qqq
q
0.010
qqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqq q
qqqqqq qq qqqq
q
0.005
#{i such that yi ≤y}
q
Density
1. Empirical distribution function
n
1
1(yi ≤ y)
F (y) =
n i=1
0.015
E.g. predicting someone’s weight (Y )
2. Compute bound estimates
90%
(α/2)
y+ = F −1 (1 − α/2)
0.000
y− = F
−1
100
150
200
250
Weight (lbs.)
6
7. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using some covariates
βM
1(X1,i = F) + εi
0.010
0.005
β1
Male
βF −βM
with E(ε) = 0 and Var(ε) = σ 2
0.000
or Yi = β0 +
Female
Density
based on his/her sex (X1 )
β + ε if X = F
F
i
1,i
Model : Yi =
βH + εi if X1,i = M
0.015
E.g. predicting someone’s weight (Y )
100
150
200
250
Weight (lbs.)
7
8. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using some (categorical) covariates
E.g. predicting someone’s weight (Y )
0.015
based on his/her sex (X1 )
Conditional parametric model
0.005
Male
0.000
−→ our prediction will be
Female
Density
F if X = F
θF
1,i
i.e. Yi ∼
Fθ if X1,i = M
M
0.010
assume that Y |X1 = x1 ∼ Fθ(x1 )
100
conditional on the covariate
150
200
250
Weight (lbs.)
8
9. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using some (categorical) covariates
0.010
0.005
Density
0.005
Density
0.010
0.015
Prediction of Y when X1 = M
0.015
Prediction of Y when X1 = F
0.000
90%
0.000
90%
100
150
200
Weight (lbs.)
250
100
150
200
250
Weight (lbs.)
9
10. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Linear Models, and Ordinary Least Squares
E.g. predicting someone’s weight (Y )
250
q
based on his/her height (X2 )
q
q
q
Linear Model : Yi = β0 + β2 X2,i + εi
200
2
Conditional parametric model
q
q
q
150
q
q
100
q
q
q
q
q
q
q
q q
q q q q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
E.g. Gaussian Linear Model
β0 +β2 x2
q
q
q
Y |X2 = x2 ∼ N ( µ(x2 ) , σ 2 (x2 ))
q
q
q q
q
q
q q
q
q
q
q
assume that Y |X2 = x2 ∼ Fθ(x2 )
q
q
q
q
Weight (lbs.)
with E(ε) = 0 and Var(ε) = σ
q
q
q
q
q
q
q q q
q
q
q q
q
q
q
q
q
q q
q q
q q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
σ2
5.0
5.5
6.0
6.5
Height (in.)
n
[Yi − X T β]2
i
−→ ordinary least squares, β = argmin
i=1
β is also the M.L. estimator of β
10
11. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using no covariates
250
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
Weight (lbs.)
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
150
q
q
q
q
q
q
q
q
q
q
q q
q q
q q q q
q q q q
q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q
q q
q q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q q
q q
q
q
q
q
q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q q
q
qq
q
q
qq
q
q q
q
qq
q
qq
q q
q
q
qq
qqqq
q
q
q
q
q
q
qq
q
q
qq
q q
q q
q
q
q
q q
q
q q
q
q q q q q q
q q
q
q
q
q
q q q q q q
q
q
q
q
q q
q q qq
q
q
q q
q
q
q
q
q
q q q qq
q
q
q q
q q q q
q q qq q q
q
q
q
q
q
qq
q
q
q q q q qq
q
q
qq q
q
q qqqqqqq
q
q
q q
q
q
q q qq
q
q
q qq
q
q
q
q
q
q
90%
200
q
q
q
q
q
q
q
q
100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5.0
5.5
6.0
6.5
Height (in.)
11
12. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using a categorical covariates
E.g. predicting someone’s weight (Y )
based on his/her sex (X1 )
E.g. Gaussian linear model
Y |X1 = M ∼ N (µM , σ 2 )
q
1
E(Y |X1 = M) =
nM
q
q
q q
q q q
q
qq
q
qq
q
q q
q qq
q
q
q
qq
q q q
q
qqqq
q
q
q
q
q
q
qq
q
q
qq
q
qq
q
qq q
q
q
q
q q
q
q q q q q q
q
q
q q q
q
q q q q q q
q
q
q
q
q q
q q qq
q
q
q q
q
q qq q q
q
q
q q
q
q q q qq qq q
q q q q q
q
q
q
q
q
q
q
q
q
q q q q qq
q
q
qq q
q
q qqqqqqq
q
q
q q
q
q
q
q q q
q
q
q qq
q
q
q
q
q
q
q
q
Yi = Y (M)
i:X1,i =M
q
q
q
q
Y ∈ Y (M) ± u1−α/2 · σ
q
q
q
q
q
q
q
1.96
Remark In the linear model, Var(ε) = σ 2 does not depend on X1 .
12
13. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using a categorical covariates
E.g. predicting someone’s weight (Y )
based on his/her sex (X1 )
E.g. Gaussian linear model
Y |X1 = F ∼ N (µF , σ 2 )
q
1
E(Y |X1 = F) =
nF
q
q
q q
q q q
q
qq
q
qq
q
q q
q qq
q
q
q
qq
q q q
q
qqqq
q
q
q
q
q
q
qq
q
q
qq
q
qq
q
qq q
q
q
q
q q
q
q q q q q q
q
q
q q q
q
q q q q q q
q
q
q
q
q q
q q qq
q
q
q q
q
q qq q q
q
q
q q
q
q q q qq qq q
q q q q q
q
q
q
q
q
q
q
q
q
q q q q qq
q
q
qq q
q
q qqqqqqq
q
q
q q
q
q
q
q q q
q
q
q qq
q
q
q
q
q
q
q
q
Yi = Y (F)
i:X1,i =F
q
q
q
q
Y ∈ Y (F) ± u1−α/2 · σ
q
q
q
q
q
q
q
1.96
Remark In the linear model, Var(ε) = σ 2 does not depend on X1 .
13
14. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using a continuous covariates
E.g. predicting someone’s weight (Y )
based on his/her height (X2 )
E.g. Gaussian linear model
Y |X2 = x2 ∼ N (β0 + β1 x2 , σ 2 )
q
E(Y |X2 = x2 ) = β0 + β1 x2 = Y (x2 )
Y ∈ Y (x2 ) ± u1−α/2 · σ
q
q
q q
q q q
q
q
qq
q
qq
q
q q
q
qq
q
qq
q q
q
q
qq
qqqq
q
q
q
q
q
q
qq
q
q
qq
q q
q
q
qq q
q q
q
q q
q
q q q q q q
q q
q
q
q
q
q q q q q q
q
q
q
q
q q
q q qq
q
q
q q
q
q
q
q
q
q q q qq
q
q
q q
q q q q
q q qq q q
q
q
q
q
q
qq
q
q
q q q q qq
q
q
qq q
q
q qqqqqqq
q
q
q q
q
q
q q qq
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1.96
14
15. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Improving our prediction ?
sex as
covariate
(Empirical) residuals, εi = Yi − X T β
i
height as
covariate
no covariates
Yi
R2 or log-likelihood
−50
0
50
100
−50
0
50
100
parsimony principle ?
−→ penalizing the likelihood with
the number of covariates
Akaike (AIC) or
Schwarz (BIC) criteria
q
q
q
q
qqq
q
qq
q
15
16. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Relaxing the linear assumption in predictions
Use of b-spline function basis to estimate µ(·) where µ(x) = E(Y |X = x)
250
q
250
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q q q q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q
q
q
q q
q q
q q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q q
q
q q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q q q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
5.0
q
q
100
150
q
q
q
q
q
q
q q
q
q
150
Weight (lbs.)
q
q
q
q
q
q
q
200
q
q q
q
q
q
Weight (lbs.)
200
q
q
q
100
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q
q
q
q q
q q
q q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
5.5
6.0
Height (in.)
6.5
5.0
5.5
6.0
6.5
Height (in.)
16
17. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Relaxing the linear assumption in predictions
E.g. predicting someone’s weight (Y )
based on his/her height (X2 )
E.g. Gaussian linear model
Y |X2 = x2 ∼ N (µ(x2 ), σ 2 )
q
q
q
q q
q q q
q
q
qq
q
qq
q
q q
q qq
q
qq
q
q
q q q
q
qqqq
q
q
q
q
q
q
qq
q
q
qq
q
qq
q
qq q
q q
q
q q
q
q q q q q q
q
q
q q q
q
q q q q q q
q
q
q
q
q q
q q qq
q
q
q q
q
q qq q q
q
q
q q
q
q q q qq qq q
q q q q q
q
q
q
q
q
qq
q
q
q q q q qq
q
q
qq q
q
q qqqqqqq
q
q
q q
q
q
q
q q q
q
q
q qq
q
q
q
q
q
E(Y |X2 = x2 ) = µ(x2 ) = Y (x2 )
Y ∈ Y (x2 ) ± u1−α/2 · σ
q
q
q
q
q
q
q
q
q
q
q
q
q
1.96
Gaussian model : E(Y |X = x) = µ(x) (e.g. xT β) and Var(Y |X = x) = σ 2 .
17
18. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Nonlinearities and missing covariates
250
q
E.g. predicting someone’s weight (Y )
q
q
q
based on his/her height and sex
q
q
200
model mispecification
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
150
q
E.g. Gaussian linear model
β
0,F + β2,F X2,i + εi if X1,i = F
Yi =
β0,M + β2,M X2,i + εi if X1,i = M
q
q
Weight (lbs.)
−→ nonlinearities can be related to
q
q
q
q
q
q
q
q
q
q
q
q q
q q q q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
100
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q q
q
q
q
q
q
q
q q q
q q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
5.0
5.5
6.0
6.
Height (in.)
n
ωi (x) · [Yi − X T β]2
i
−→ local linear regression, β x = argmin
i=1
T
and set Y (x) = x β x
18
19. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Local regression and smoothing techniques
q
q
250
250
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
100
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q q q
q q q q
q
q
q
q qq
q
q q
q
q q
q
q
q q
q
q q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q q q
q q q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q
q q
q q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5.0
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
100
150
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
150
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
Weight (lbs.)
Weight (lbs.)
q q
q
q
q
q
q
q
200
200
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q q q
q q q q
q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q
q q
q q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q q
q q
q
q
q
q
q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5.5
6.0
Height (in.)
6.5
5.0
5.5
6.0
6.5
Height (in.)
19
20. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
k-nearest neighbours and smoothing techniques
q
q
250
250
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
150
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q q q
q q q q
q
q
q
q
q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q q q
q q q q
q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q
q q
q q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q q
q q
q
q
q
q
q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5.0
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q q
q q
q
q q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
100
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q
q
q
q
q
q q
q q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
100
q q
q
q
q
q
q
q
q
q
q
q
q
150
q
q
q
q
q
q
q
q
q
q
q
Weight (lbs.)
Weight (lbs.)
q q
q
q
q
q
q
q
200
200
q
q
q
q
q
q
q
q
q
5.5
6.0
Height (in.)
6.5
5.0
5.5
6.0
6.5
Height (in.)
20
21. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Kernel regression and smoothing techniques
q
q
250
250
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
150
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q q q
q q q q
q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q q
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q
q q
q q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q q
q q
q
q
q
q
q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q q q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q
q
q
q q
q q
q q
q
q q
q
q
q
q
q
q q
q
q
100
q
q
q
q
150
q
q
q
q
q
q
qqq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
Weight (lbs.)
Weight (lbs.)
q
q
q q
q
q
q
q
q
q
q
200
200
q
q
q
q
q
5.5
6.0
Height (in.)
6.5
5.0
5.5
6.0
6.5
Height (in.)
21
22. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Multiple linear regression
10
E.g. predicting someone’s misperception
5
of his/her weight (Y )
0
based on his/her height and weight
−5
X2
−10
250
−→ linear model
200
ht
ig
We
6.0
150
s.)
(lb
E(Y |X2 , X3 ) = β0 + β2 X2 + β3 X3
Var(Y |X2 , X3 ) = σ
X3
5.5
2
g
Hei
100
in.)
ht (
5.0
22
23. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Multiple non-linear regression
10
E.g. predicting someone’s misperception
5
of his/her weight (Y )
0
based on his/her height and weight
−5
X2
−→ non-linear model
250
200
ht
6.0
150
s.)
(lb
Var(Y |X2 , X3 ) = σ
−10
ig
We
E(Y |X2 , X3 ) = h(X2 , X3 )
X3
5.5
2
ght
100
Hei
)
(in.
5.0
23
24. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Away from the Gaussian model
Y is not necessarily Gaussian
Y can be a counting variable,
E.g. Poisson Y ∼ P(λ(x))
Y can be a FALSE-TRUE variable,
q
E.g. Binomial Y ∼ B(p(x))
−→ Generalized Linear Model
q
q
q
q
q
q
E.g. Y |X2 = x2 ∼ P e
q
q
(see next section)
β0 +β2 x2
q
q
q q
q q q
q
qq
q
qq
q
q q
q qq
q
q
q
qq
q q q
q
qqqq
q
q
q
q
q
q
qq
q
q
qq
q
qq
q
qq q
q q
q
q q
q
q q q q q q
q
q
q q q
q
q q q q q q
q
q
q
q
q q
q q qq
q
q
q q
q
q
q
q
q
qq qq
q
q
q q q qq qq q
q q q q q
q
q
q
q
q
q
q
q
q qq
q
q q q q q
q
q
q
q
q qqqqqqq
q
q
q q
q
q
q q qq
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
Remark With a Poisson model, E(Y |X2 = x2 ) = Var(Y |X2 = x2 ).
24
25. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Logistic regression
qqq
q q
q
0.4
of his/her weight (Y )
1 if prediction > observed weight
Yi =
0 if prediction ≤ observed weight
0.0
E.g. predicting someone’s misperception
qq q q q q q q q q q q q q q q q q q q
q
qq
0.8
q q q
qq q q q q q q q q q q q q q q q q q q q q q qq q q q qq qq
q
qq
qq qq
q
q
5.0
5.5
6.0
6.5
Height (in.)
1.0
Bernoulli variable,
qq
0.4
0.6
0.8
q
P(Y = y|X = x) = p(x)y (1 − p(x))1−y ,
where y ∈ {0, 1}
0.2
−→ logistic regression
0.0
P(Y = y) = py (1 − p)1−y , where y ∈ {0, 1}
q qqqqqqqqqqq qqqq qq q
q qqqqqqqq qq q q
q
q
q q qqqqq qqqqqqqqqqqq q qq
q
qqqqq qqqqq
qqqqq q
100
150
200
q
q
q
250
Weight (lbs.)
25
26. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Logistic regression
1.0
−→ logistic regression
0.8
y
P(Y = y|X = x) = p(x) (1 − p(x))
1−y
,
0.6
where y ∈ {0, 1}
P(Y = 1|X = x)
Odds ratio
= exp xT β
P(Y = 0|X = x)
0.4
0.2
0.0
250
200
ht
ig
We
6.0
150
s.)
(lb
exp xT β
E(Y |X = x) =
1 + exp (xT β)
Estimation of β ?
5.5
g
Hei
100
in.)
ht (
5.0
−→ maximum likelihood β (Newton - Raphson)
26
27. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Smoothed logistic regression
qqq
q q
q
0.4
0.0
GLMs are linear since
P(Y = 1|X = x)
= exp xT β
P(Y = 0|X = x)
qq q q q q q q q q q q q q q q q q q q
q
qq
0.8
q q q
qq q q q q q q q q q q q q q q q q q q q q q qq q q q qq qq
q
qq
qq qq
q
q
5.0
5.5
6.0
6.5
q
qq
0.8
q qqqqqqqqqqq qqqq qq q
q qqqqqqqq qq q q
q
0.2
0.0
exp (h(x))
E(Y |X = x) =
1 + exp (h(x))
0.4
0.6
−→ smooth nonlinear function instead
P(Y = 1|X = x)
= exp (h(x))
P(Y = 0|X = x)
1.0
Height (in.)
q
q q qqqqq qqqqqqqqqqqq q qq
q
qqqqq qqqqq
qqqqq q
100
150
200
q
q
q
250
Weight (lbs.)
27
28. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Smoothed logistic regression
1.0
−→ non linear logistic regression
P(Y = 1|X = x)
= exp (h(x))
P(Y = 0|X = x)
0.8
0.6
0.4
0.2
E(Y|X = x) =
exp (h(x))
1 + exp (h(x))
0.0
250
200
ht
ig
We
s.)
(lb
Remark we do not predict Y here,
6.0
150
5.5
ght
Hei
100
but E(Y |X = x).
)
(in.
5.0
28
29. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Predictive modeling for a {0, 1} variable
1.0
What is a good {0, 1}-model ?
Observed
Observed
−→ decision theory
0.2
Y=1
q
0.6
q
Y=0
^
Y=1
qq qqq qqqqq qq qq q
qq qqq qqqq qqq
qq qqq qqqq qq q
q qqq qq
q
qq
qq
q
0.4
^
Y=0
q
0.8
Predicted
0.0
if P(Y |X = x) ≤ s, then Y = 0
if P(Y |X = x) > s, then Y = 1
q qqqqqqqqqqqqqqqqq q
q qqqqqqqqqqqqq qq q
qqq qqqq qqq q
qq qqqqqq q
qqq q q q
qq
qq qqq
q
q
0.0
0.2
0.4
q
0.6
q
0.8
1.0
0.8
1.0
1.0
Predicted
error
Y =1
error
fine
0.6
fine
0.4
Y =0
q
0.2
Y =1
q
0.0
Y =0
True Positive Rate
0.8
q
0.0
0.2
0.4
0.6
False Positive Rate
29
30. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
q
Observed
Observed
Y=0
T P (s) = P(Y (s) = 1|Y = 1)
nY =1,Y =1
=
nY =1
False positive rate
^
Y=1
0.0
0.2
Y=1
q
qq qqq qqqqq qq qq q
qq qqq qqqq qqq
qq qqq qqqq q q
q qqq qq
q
qq
0.6
^
Y=0
q
0.8
Predicted
0.4
True positive rate
1.0
R.O.C. curve
q qqqqqqqqqqqqqqqqq q
q qqqqqqqqqqqqq qq q
qq qqqqqq q
qqq qqqq qqq q
qq q q q
q
q
0.0
0.2
0.4
q
0.6
q
0.8
1.0
F P (s) = P(Y (s) = 1|Y = 0)
nY =1,Y =1
=
nY =1
q
q
0.6
0.4
0.0
0.2
True Positive Rate
0.8
q
R.O.C. curve is
{F P (s), T P (s)), s ∈ (0, 1)}
1.0
Predicted
0.0
0.2
0.4
0.6
0.8
1.0
False Positive Rate
(see also model gain curve)
30
31. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Classification tree (CART)
250
q
q
q
q
If Y is a TRUE-FALSE variable
200
E(Y |X = x) = pj if x ∈ Aj
q
q
q
q
q q
q
q
q
q q
q
q
150
q
q
q
q
q q
q q q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
regions of the X-space.
q
100
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
where A1 , · · · , Ak are disjoint
q
q
Weight (lbs.)
prediction is a classification problem.
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5.0
5.5
6.0
6.5
Height (in.)
31
32. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Classification tree (CART)
−→ iterative process
q
250
P(Y=1) = 38.2%
P(Y=0) = 61.8%
Step 1. Find two subset of indices
either A1 = {i, X1,i < s}
or A1 = {i, X2,i < s}
q
q
q q
q
200
q
q
q
q
q
q
150
q
q
q
100
q
& maximize heterogeneity between subsets
q
q
q
q
q
q
q
q q
q q q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
maximize homogeneity within subsets
q
q
q q
q
q
q
q
and A2 = {i, X2,i > s}
q
q
Weight (lbs.)
and A2 = {i, X1,i > s}
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
P(Y=1) = 25%
P(Y=0) = 75%
q
5.0
5.5
6.0
6.5
Height (in.)
32
33. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Classification tree (CART)
250
q
q
q
q
q
q
y∈{0,1}
0.317
q
q q
q
q
q
q q
q
q
q
150
q
q
q
q
q
q q
q q q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
100
x∈{A1 ,A2 }
nx,y
nx,y
· 1−
nx
nx
q
q
0.523
Weight (lbs.)
E.g. Gini index
nx
−
n
200
Need an impurity criteria
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.273
q
5.0
5.5
6.0
6.5
Height (in.)
33
34. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Classification tree (CART)
250
q
Step k. Given partition A1 , · · · , Ak
q
q
q
0.472
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q q
q q q q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
100
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q
q q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
maximize homogeneity within subsets
& maximize heterogeneity between subsets
q
q
q
150
or according to X2
q
0.621 q
0.444
Weight (lbs.)
either according to X1
200
find which subset Aj will be divided,
q
q
q
q
q q
q
q
q
0.111
q
q
q
q
q
q
q
q
q
q
0.273
q
5.0
5.5
6.0
6.5
Height (in.)
34
35. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Visualizing classification trees (CART)
Y < 124.561
|
Y < 124.561
|
X < 5.39698
0.3429
X < 5.69226
0.1500
X < 5.69226
0.2727
X < 5.52822
0.4444
0.5231
0.3175
Y < 162.04
0.6207
0.1111
0.4722
35
36. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
From trees to forests
q
q
250
Problem CART tree are not robust
−→ boosting and bagging
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
0.808
Then aggregate all the trees
q
q
q
0.520 q
q
q q q q
q
q q
q
q
q q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
150
repeat this resampling strategy
q
q
q
q
q
q
q
Weight (lbs.)
and generate a classification tree
0.467
200
use bootstrap : resample in the data
0.576
q
q
q
q
q
q
q
q q
q q q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q q q
q q q
q q
q q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q
q q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
0.040
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.217
q
q
5.0
5.5
6.0
6.5
Height (in.)
36
37. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
10
A short word on functional data
q
q
q
q
−15
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Individual data, {Yi , (X1,i , · · · , Xk,i , · · · )}
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5
0
−5
−10
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−20
Temperature (in °Celsius)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5
10
15
Week of winter (from Dec. 1st till March 31st)
Functional data, {Y i = (Yi,1 , · · · , Yi,t , · · · )}
qq
qq
0
−10
−20
−30
Temperature (in °Celsius)
10
E.g. Winter temperature, in Montréal, QC
q
q
qq q
qq
q
q
qq
q
q
q
qq q
q
q
q
qq
q
qq qqqq
q q
q
q
q
qqqqqqqqq
q
qqq qqq
qq
q
q
q
qq
q
q
qq q
qq q
q
qq
qqqqq
q
q
qq q q qqq qqqqqqqqqqqqqq
qqq qqqqqq
q
qqq
q
q
q
qq
q
q
qq
qq
q
qq
q
qq
q
qq
q
q
q
qqq q
q
q q q q
q q
qqq
q
qq
qq
qqqqqq
qqqqqqqqqqqq qqqqqqqq
q
qq q q
qqqq q qq
q q q q qqqq qqqqqqqq
qqqqqqq
q
q qq
q
q
q
q q q qq
q
q q
qqqq
q
qq
qqq
q q
qq
q
qq
q
qqq
q
q q q
q qqqq q qqqqqqqqqqqq
q
qqqqqqq qq qqqqq
q
q
q
qqqqq qq qqq
q q
q
q qq q q q q qq
qq qqqq qqqqqqq
q
qq
q
q
q q
q
qq
qqqqqqq qqqqq qqqqqqqq qq
q
q q
q qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqq
qq q
qqqq
q
qq qqq
qq q
q q
q q q
qqq qqq
q
q
qq qqq
qq q qq qq
q q q q q qqqqq
q
qq
q
qq qq
qqqqq qqqqqqqq qq
qqqq qqqqqqq qqqqqqqqqqq q qqqq
qqqqqqqqq
q qq qq qqq
qqqqqq
qq q qqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqq
q
qqq
q
q
q qq
qqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqq
q
qqqqq
qqqqqqqqqqqqqq qqq
q qqq
qq
q qqq
q qqqqqqqqqq qq qqqqqqq qqqqqqqqqqqqq qqq q qqq qqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qq
q qqqqqqqqqqqqqq q qqqqqqqqqq
qqqqqqq q q qqq q qq
qq
q
qqq
q qqq q
q qqq q
q
qq
qqq qqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqq qqqqqqqq qqqq
q qqqqqq q
qq qq q
q qq
qqqq q
qq
q qq
q
qq qq qqqq qq qqqq
q
q q
q
q
qqq
qq qq qqq
q
qqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqq
qq q qqqqqq qqqqq qqqqqqqq qqqqq
q q qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqq qqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q q q
q q
qq q qq
qqqq qq
qqqqqqqqqq
qqqqqqqqqqqq
qqqqqqqqqqq qqqq qqqqqqq qqqqqq qq
qqqqqq
q
q q q
q
q
q
q
q
q
q
q
q
qq qq q
q q q
qqq qqqqqqqqq qqqqqqqqqqqqqqqqqqq
q q qqq
q qq qqq qqqqqqqqqqqqq qqqqqqq qqqqqqqqqqqqqqqq
q qqqqqqq q
qqq
qqqq qq
qq qqqqq q
q
qqqq q
q
q q qqqqqqq qqq qqqqq qq q
qqq
qqqqqq qqqq
q
q qq
q q
qqqqq q
q
q
qq qqqqqqqq qqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq qqqq qq q
q
q
qqqqq q
q qq
q
q q q
q
q
q
q qq qqqq qqq qqqqqq
q qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqq
qq
q
qqqq
qq
qq qqqq qqqq
q q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq
qq
qqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqqq
q
qqq
q
qq
qqqqq qqqqq q qqqq
q q qqq
q
q
qq qqqqqqqq qqqqq
qqqq
qq qq
q qq
q qqq
qq
qqqq q
q
q
qqq q qq q q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqq
qq qqqqq
q
qqqqqqqq q qqqqqq qqqqqqqqqqqqqqqqq qqqqqq qqqq qqq q qqq qq q
qq q
q qqq
q
qqq q qqqq qq qqqq q qqqqqqqqq qqqqqq q
qq
q q
qq
q
q
qq qq q
q
q q
q
qqqq
q qqqq
qq
q q
q
qqqqq qq q
q
q
q qq
q
qqqqqqqqqqqq qqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqq q qqqqqqqqq qqq qqqqqqq
qqqqq qqqqqqqq qqqqqqqqqqqqq
qq
q
qq
qq
q
q
q
qq
qqq
q
qqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qq qqqq
q q q q
q
qqqqq q q
qqq qqq
q qq qqq q qqqq qqqqqq
q q qqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqq
q q
q
q
q q qqq qqq
q
q
qq
q
q q qqqqqq qqq qqqqqqq qqqqqqq
q
qqq
q
q
q
q q
q
q
q
q qq q
qqqqq qqqq q qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqq qqqqqqqqqqq qq
q qqqq q qq q q
q
q
q
qq
qqqqqqqqqqqqqqq qq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqq qqqqq qqqqqqqqqqqq q
q
qqq qq qq q qqqq qqqqq q qq q q
q q qqqqq q qqq qq qqqqq qqqq
qq
qqq
q qq
qqq qqqqq q
q
qqqqqqqq
q qqq
q qqq
q
q
qq
qqq
q
qq
qqqqq q q
qqqqqqq
qq
qq
qq
qq
qq qq
q
qq q
qq qqq
q
qqqqqqq qq
qq
q
qqqq
qq
q q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqq qqqq qqqqqqqqqqqqqq qq qq qq
q
q
q q
q
q
qq qqqqq q qqq
qqq q qq qq
q
qqq qqqqq qq
qqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqq qq qqqqqqqqqqqqqqqqq
qq
q q q
q q qq q
q
qq qq q
q
qqq
qqq
q qq
qq
q
q
q q
q q
q
qqqq qqqqqq qqqqqq qqqq
q qqq
q
q qq q q q
qqqq
q
qqqqqqq
q
q
q qqqq qqqqqqqq qq
q
qqqqq
q
q
q
qqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqq
qq q
q
q
q q
q qqq q
q
q
q
qqq q qqq qqqqq qq qqqq q
q qq qqqqq qqqq q qqq q qqqqq q
q q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq
q q q
qqq
q
q q qq q
qqqq qq qqqqqqqq qq q qqqqqqqqqqq
qqqqqqqqqqqqq qq qqq q q
qqqq
q
q
qqqqqq
q q
q
qq
q q
q q
q q qqq q
q
q qqq qqqqqq
qqqq
q qqqqqqqqqq q qqq q qqqqqqqq qq qq qq
qq
qqqq qqqqq qq qqqq qq qq q qqqqqqqqqq
q
q
q q
q
q
q
qqqq q
q
q
qqq
q qqqqqqqqqqqqqqqqqq q qqqqqqqqq qqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqq qqqqqqqqqq
q
q
q
qqq
q
qq q
qqq qq
qq
qq qqqq q q
q
q
qqq q qqq
q
qq qqqqqqqqqqqqq qqqqqqqqq qq q qqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqq q qq
q
qqqq
q
qqqq qqqqq qqqqq q
q
q
q
qq
q
q
q
q qq qq qq
qq q
q
qq
qqq
qqqqq
q
q
qqqqqqqqq q qqqqqq q qq
q
q qqqq
qq
q
q
qq
q q q qqqq
q q qq q
q
q qq
qq q
qq
qqqq qqqqqqqqqqq qqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqq q
q
q
qq
q
q q
q
qqqqq qqqqq
qqqqqqq
q
q
q
q
q
q qqqq
qqqqq q
q
q
q
qqqqq qqq qqqq
q
q
q
q q q
qqqq
q
q
q
q q
q qq
qqq
q q q
q
q q
q
qq
q
q q qqqqqqq qqq qq qqqqq qqqqqqqqqqqqqqq q qq qqq qqqqqqqqqqqqqqq qq
q
q
qq
q qq
q
q
q
q q q
q
q q qq
q q
qq qqqqqq qqqqqqq qqq qq
q qqq qq
q
q qq
q qqq
q qqqq
qqqq
qq q qqqq qqqqqqqqqqqqqq
qq q
q
q
q
q q
q
q
q qqqq qqq qq
q q
q
q
qqq
qq
q
q
qq
qqq
q
q
q q
q
qqqqqq q qqq
q
q
q q qq
qqq
q
q
q qq q
q q
q qqqq
q
q
qqqq qqqqqq qq
q
qq
q
q
q qqq
q qq
qq
q q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
qqq
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
qq
0
20
40
60
80
100
120
Day of winter (from Dec. 1st till March 31st)
37
38. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
A short word on functional data
Daily dataset
2001
1982
q
1974
1999
2007
0
PCA Comp. 1
40
50
q
q q
q
2006
2009
q
q
1957
q
2011
1997
20
q
q
−50
q
1986
2005
q
1961
qq
qq q
q
q
qqqq
qqqq
q
qq qq
q qq
q q q qq
qqq qqqq q
q q
q qq qq q q
qqqqq q qqq
q
qq q
q
q
q qqqqqqqqqqq
q q
q q q q qq
q
qqq q qq
q qqqqqqq q q qqq
q
q
qq q q qqqq qqq
q q q qqqq q q qq
q q
q
q qq
q
qq qq qq qq q qqqq q qq
qqqq
qqq
qq q
qq qqq q qq q qqqq q qq
q q q
q
qq q q q q q qq
q
qqq qq q qq qq qq qq q qqq
qqq q qqq qq qq q
q qq q q q qq
q
qq q q
q
q qq qq qq qq q q q q
qq q q
q
q
q
q qq
qqqqq qq q qq qq qq qq q
q
q
qq
qq qq qq
q
q q q q qq q q q q q q q qq q q
q qq q q
qqq q qqqq qqq q q qq qq q q qq q
q qq q
q q qq
q q
qq q
q
qq q qqq
q
q
q q qq q
q
qqq q qqqq q qqq q q q q qq q q qq q
q qq q q
q
qqqqq qq q
q
q q
q q q qqq q qqq q q q q q q q
qq q q q
qqq qqq q
q q q q q q q q qqqq qq q qqq
qqq qqq q
q
q qq
q
qqqq qqq q qqq qqq q q qq q qqq q
qqqqqq q
q
q
qq
q q qq qqq q
qqqq qqq qq q qq qq q qqqq q q q qq q
q
q
q q qq
qqqq
q q qqq q
qqq
qqqq
q
q q qqq q q
q qq q qqq q qqqqqqq q q
q q q qq
q q q qqq q q q qqq
q
q q q
q
q q
qqq q
q
qq
q
qq q
qq
q
qqq q qqqq qqqq q
q
qq qqq
q
q
q
q qq qqq q qq qq q q qq qqq
q qq q q qq q
qq qq qq qqqqq q
q qqq
q
qqq qqq q
qqqq q q qq q qqqq qqqq qq q q qqq q
qq q qqqqqqq q
q qq qq q
q
q qq
q
q
qq
q qq
q
q
q
q
q qqqq qqqq qq
q
q q q qqq q qqq q
q
qq qqqqqqq q qqqq qq qqqq qq q
q q qq q q q
q qq qqq qq q qq
qqq
q
qq
q
qq q q q q q q q q qq q q q
q
qq
qq
q
q
q q
qqq qqqqqqqqqq q qq qqqq qq q
qq qq qq q qqqq
q
q
q q q qq q
qqq q qq qqq
qq q
q q qqqqqq q q qqqq qq qqq q q qq
q qq q q qqq q
q qq q
q q q
q
qq qq qqqqqqqq
qq q qq q q
q
qqq q
qq qq qq qq q q
q
q
qq
qq
q q q q q q q qq q
q qqq q qqqqqq q q qqqq qq qq q qq q
q q
qq q
qq qqqq qqqq
qq
q qq
q
qq q qq q q qq qqq q
q q qq
q
q q qq qq q q q qqqq qq qqq
q
q q q qqq
q
qq q qq
q
qq
q qq qq
qq
q qqqqq qq
qqq q q qqqqqq q q qqq q qq q qq q
q q qq q q qqqq q qq
qq
q q qqq q q
q q
q
q
q
q
q qq q q
qqqq qqqq q qq qq
q qqqq q qqq q q q q q qq q q q q q
qq
q
q q qq q
qq q
q
qqqqq q qqqqq qq
q q q q q qq
q q
q
q
q
qq qq q qqqqq q q qqqqqq q q q qq q
q
q qqqqq
q
qq qqqqqqqqq qq qq qq
q
qq
q
q qq qq q q qqqqqq q q q q q
q q qq
q
q q
q
q
q
qq q q q qq q q qq qq qq
q qq q
q
qq
q
qq
qq
q qq q
q q
q
q
q q qq q qq q qqq qq q qq q
qq q q
q
qqqq qq q q q
q
qq
q
qqqq q qq qq q qq qqq q q
qq q qqq qq q q qq qq q qq q q
qqq qqq
q q q q qq
q
q qqq q
q
q
qq
qq q q q qq q
q qq q qq q q qq q q qq
q q qqqq q qqq qqqq q qqq qq q qqqqqqqq q
q q q q q qq qq qq q q qq
q
qq q
qqqq qq
q
q
q
q qq
q
qq
qq q q
q
q qq
qq q q qqq qqqq qq qqq qqq qqq q q qqq q
q q q
q qq q q
q qq qq qq qqqqq
qq
qq q
qq
q
qq q
q qq q qq qqq q q q
q q q
q qq q qqq qq q
q q
q q
qq q q qqqq qq q q qqqq q qq q q q q qqq q
q
qq q
q q
qq qq q qqqqqq q qqqqqq qqq qqq qqqqq q
q
q qq q qq qqqqq
q q q qq qq qq
qq
qq
qq
qqq q q qq q qq qq q qqqqq q q q
q
q q q
q
q q q qqq q
qq
q
q qq q qqq qqqqqq q
qqq q q qqqq qq q q qqqqq q q q q q q q qq
q q q
q
q
qq
q q
q
q q
q
qq q qqq qqq qq q q qq q qqq qqqqqq qqqq qqqqq q q qq q qqq q q qq q
q
q
q
qq
q
qq q
q q q qq
qq
q
q qqq q q qqq qqqq qqqqq q q q qq q qq q
q
q q qq qqq q q qqqqq q
q qq q q q q q q
q
q
q
q
q
q q q qq q qqqq q q qq q q q qq
q qq qqq q qqqqqq q q q qqq q qqqqq qqqq q qqqqq qq q qq q q q qq
q
q
q
q
q qq qq qqqq qqqq q qqq q qqqqq qq qqqq qqq qq q qq q q q q q
q
qq qq qq
q
q
q
q
q
q qq
q
q q
q
q
q qqqqqqq qq qqqqqqq q qqqqq qqqq q qqqq q qq q qq q q q q
q
q
q qqqq qq q qqq q qq q q qqqqq qqqq q qqq q qq q qq q q q
q q qq q q qq q q qqq qq qqq qq q q q q qqqqq qq q
q
q q q q q qq q q
q
qq qq qq q q q q
qq
q
q qq
qq qq q q qq q q
qqqqq q q qq q qq q q q q
q
qqq q
qqqq q
qq
qq
qq qqq qq qq qq q q q
qqqq q q qqqq q qq q qq q
q q q
q
qq q q qq q qq q q
qqqq q q qq qqqq q qq q
q
qqqq qq q q q
q q qq q
q qq
qq
q q
qq qqq qq q q q q q
q
qq qqq q q qq q q
qq q q q qq q
qq
qq
qqqqq q q qq q q
q q
q
q
q
q
qqqqq q q qq
q
q
q
q qq
q q qq q q
q
qqq
qq
q
qq q q
q
q
q
q
qq q
q
qqq
q
qq
qq
q
JANUARY
DECEMBER
FEBRUARY
MARCH
q
1979
q
Comp.2
0
q
q
q
q
1994
1996
1966
1964
1973
2010
1955
1978
q
20
40
60
80
100
120
Day of winter (from Dec. 1st till March 31st)
1987
q
qq
1953
q
q
1962 1954
1991
q
q
q
1989
q
1960
0
q
2000
q
1988
1963
1965
1959 1990
q
q
30
q
q
1992
1971
2004
q
q
q
q
2002
1998
q
1977
1972
1985
20
q
q
q
1984
q
q
2008
q
1968
q
10
1956
1995
q
q
2003
q
1969
1975
1983
q
q
1976
q
q
1993
q
1980
1967
−20
q
q
−20
−10
0
10
Comp.1
20
30
40
50
−30
−20
1981
0
q
q
−10
1958
1970
PCA Comp. 2
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
qq
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
qq
qq
qq
q
q q
q
q
q q
qq
q
q
q qq
qq
q q
q
q q q
q qq
q q
q
qq q q
q
q q q
q q
q q
q qq q
q
q
q
q
q
qq qq q q
q
q q q q
q q q qq q q
qq q qq q
q
q q q qq q q
qq q
qq q
q
q
qq
q
q q q
qq
q
q q
qq q q qq q q q q q qq
q
qq
q
qq q qqqq q qq q q q q qqq q
q q q qq q q q
q
q q
q
q q qqq
q
q qq
q
q qq q q q q q q
q q
q
q
q
q q
q
qq
q
q qq
q
q q
q
qq
q
qq q q
q
q
q
q
q
q q q
qq q q
q q
q
q q q qq
qq qq q q qq q q qq qq qq
q q
qq q q qq qq q qq q qq
q q
q q qq
q
q
q
q
q
q
q
q q
q
q qq q
q
q
q
q q
q
q
q q
q q
q q
qq q q q qqqqq q
q q
q qq q q
q q
qq q
qq q q
qq q q
q q
q q
q
qq qq q
q
q
q q
q qqq
q
qq q
qq q
q q
q
q
q q
q
qq q q
q q q q q qq qq
q
q qq
q qq q
qq q qq
q q q q q q q qq
qq q q
q
q
q qq q q q q qq
q q
q q
q
q
qq q q q q q q qq
q
q
qq
q
q
q qqq
q
q q q
q
q
q q
q
q
q q
q q
q
q q
q q
qq qqq q qqqq q qq qq qq q
q
q q q q
qq
q
q qq
q q
q q
q q
q q qq
q
q
q q q q qqqqqq qq q q q q qqq
q
q
q qq
q q
q
q
qq q
qq
q q q qq q
q q
q q qq
q
q
q
q
qq
q
qq q
q q q q q qq q qq qq q q q q
q
q q q
q
q
q
q
q q qqq qq q qq qq q q q q
qqq
q
q
q
q qq qq q
q
qq q
q
qq
q
qq q q q qqq qq q
qq
q q
q
qq q
q
qqq q q qqq qq q
q q
q q qq
q
qq
q
qq
q
q q
qq q q q q
q q
q
q q
qq
q qq
q
q
q q
q
q
q
qq
q
qq
q
q
q
q q q qq qq
q
q
q
q
q q qq
q
q q
q
q
qq qq
qq
qq qq
q q q
q
q q
q
qq
q q
qqq
q
qq
q qq q
q q
q q
q
qq
q
q
q
q
q
q
q
qq q
q
q
q
q qq
qq
q
q
q
q
qq
q
q
q
q
q qq q
q qq
qq
q
q
q
q
q
qq q q q
q
q
q
q
q
qq
q
q
q
q
q
qq
qq
q
qq
q
q
q
qq
q
qq
q
q
q qq q q
q
qq
qq
q q qq q
q
q
q q q q
q
q
q
q q q q
qq
q
q q q q
qq
qqq q q
q q
qq q q
q
qq
q q q q
q
qqq qq qq
q
q
q
qq q qq q
q
qq q
qq
q
q
qq
qq q q
qq
q q
q
q
q q
qq
qq q q q
q
q
q qqq
q
q
q
q
q
q
q
q
q
q qqq
qq q
qq
q
q q q
q
q
q
q
qq
q
q
qq q
qq
q
qq q
q q qq
q qq q
qq
q
q q
q
q qq
q
q
q
q
q
q
q qq
qq
q
q
q
q
q q
q qq
qq q
q
q q q qq q
q
q
q
q
q
q
qq qq q
q
q q
qqq qq q
q
qq
q q
q
q
q
qq
q
q
qq q q
q
qqqq qq q
q q
q
q
q qq
q q
q
qq
q
qq
qq
q qq q
qqqq qq q
q qq qq q
q q
q qq
q
qq
q qq
q
q q
q
q
qq
q
qq
q
q
q
q
q
q qq q q
q
qq
q
q
q
q q
qq qq q q
q
q
q
q
q
qq qq
q
q
q q q qq q
q
q
q
qq
q
q
qq q qqq q
q
qq q
q
q qqq
q
q
q
q
q qq q qq q
q
qq q q
qq q q q
q
q q
qq q
q q
qq q
qq
q qq
q q
qq q q q
q qqq q
q q
q
qqq
q
q q
q
q qq
q qq q q
qq q q q qq
q
q
q
q q q qq
q
qq q
q qq
q q
q
q
q q q qq q
qq qq q
qq q
q q qq
qq
q
q
q q q qq
q q
q q qq
qq
q qq
qq
q
q q q q q qq q
qq q
q
qq q
qq
qq qq q
qq q
q qq
q
q q
qq
q
q
qq
q q
q
q
q qq
q
qq q q q
q
q
q
q
q q qq q q q q q qq
q
q qq
q
qq q q
q
q
q
q q q qq q q q
q q
q
q qqq q q q
q
q
q
q
q q qq q
q q
qq q q
qq
q q
q
q q qqq q q q q q
q
q
q q
q qq q qqq
q q
q
qq
qqq
q q q qq q q q q qq
q
qqq
q
q
qq
qq qq q qq q
q
q q
q
q
q
qqq qq
q qq
qq
q
q q q qq
qq qq q qqq q
q
qq
qq
q
q
qqq
q
q q
q
qq
q
qq q
q qq
q
q
qq q
q qq qq qq q q qq
qq
q
q
q
qq
q qq q q qq
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q q q
qq qq q qq q qqq
q
q
q
q
qq
q
qqq
q
q q
q
q qqq
q
q
q
q q qq
q
qq
q
qq
q q
q
q
q q q q q q qqq
q
q
q
q
q
q
q q q qq q qq q
q
q
qq
qq
q
q qq
q
q
q
q qqq
qq
qq
q
q qq qq qqqqq q
q
q
q q
q
qq
q
q
q
q q q q qq q
qq q
q qq q
q
q
q q
q
q q q q qq q
qq
q q
q
q
qq
q q qq qq
qq
q q qq qqq
qq qq
q q qq qq qq q
q
qq q q
q
q qq
q q
q
qqq qq
q
qq
q
qq qq
q
q
q
q
q
q
qq
q
qqq
q qq
qq
qqq
qq
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0
20
40
60
80
100
120
Day of winter (from Dec. 1st till March 31st)
38
39. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
To go further...
forthcoming book entitled
Computational Actuarial Science with R
39