College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
Introduction To Survival Analysis
1. Special Topics in Biostatistics
An Introduction to Survival Data Analysis
Federico Rotolo
federico.rotolo@stat.unipd.it — federico.rotolo@uclouvain.be
Visiting PhD student at
PhD student at
Dipartmento di Scienze Statistiche Institut de Statistique, Biostatistique
et Sciences Actuarielles
Universit` degli Studi di Padova
a Universit´ Catholique de Louvain
e
March 30, 2011
2. F. Rotolo
Survival Analysis
Outline
An example
Peculiarities of Survival Data
Notation and Basic Functions
Survival Likelihood
Parametric models
Non-Parametric models
Regression
Complications of Survival Models
Non-proportional hazards
Informative censoring
Dependent observations
Multi-state phenomena
Competing Risks
References
STiB: Survival Data Analysis 2/ 57
3. Survival Analysis F. Rotolo
Survival Analysis
What is Survival Analysis?
The field of statistics providing tools for handling duration data,
i.e. continuous and positive numerical variables measuring the
time from an origin event until the occurrence of an event of
interest.
STiB: Survival Data Analysis 3/ 57
4. Survival Analysis F. Rotolo
Survival Analysis
What is Survival Analysis?
The field of statistics providing tools for handling duration data,
i.e. continuous and positive numerical variables measuring the
time from an origin event until the occurrence of an event of
interest.
Why “Survival” Analysis?
First works on this topic originated from the problem of studying
death times, that is times from birth to death.
STiB: Survival Data Analysis 3/ 57
5. Survival Analysis F. Rotolo
Survival Analysis
What is Survival Analysis?
The field of statistics providing tools for handling duration data,
i.e. continuous and positive numerical variables measuring the
time from an origin event until the occurrence of an event of
interest.
Why “Survival” Analysis?
First works on this topic originated from the problem of studying
death times, that is times from birth to death.
Many ad-hoc statistical tools have been developed for survival
data (Cox model, Kaplan–Meier estimator, Mantel–Haenszel test, etc.) and
research interest in such problems has been increasing.
Why is Survival Data Analysis so peculiar?
STiB: Survival Data Analysis 3/ 57
6. Survival Analysis F. Rotolo
An example
An example
Consider a clinical trial with patients undergone tumour surgical
removal.
STiB: Survival Data Analysis 4/ 57
7. Survival Analysis F. Rotolo
An example
An example
Consider a clinical trial with patients undergone tumour surgical
removal.
One can be interested in
M: the level of a tumor marker after 6 months
T : the time until recurrence of the disease
In both cases the measured variable is continuous numerical and
positive, so there is no apparent difference.
STiB: Survival Data Analysis 4/ 57
8. Survival Analysis F. Rotolo
An example
Actually, other situations can perturb the experiment before the
variable of interest is observed: the patient dies, gives up the
study, migrates, another disease occurs, the study ends, etc. . .
STiB: Survival Data Analysis 5/ 57
9. Survival Analysis F. Rotolo
An example
Actually, other situations can perturb the experiment before the
variable of interest is observed: the patient dies, gives up the
study, migrates, another disease occurs, the study ends, etc. . .
In such cases
M is missing
T is missing and we know that T > s, with s the time of the
“disturbing event”
STiB: Survival Data Analysis 5/ 57
10. Survival Analysis F. Rotolo
Peculiarities of Survival Data
Censoring
Then the most particular feature of survival data is censoring.
STiB: Survival Data Analysis 6/ 57
11. Survival Analysis F. Rotolo
Peculiarities of Survival Data
Censoring
Then the most particular feature of survival data is censoring.
Right censoring (T > t) is very frequent and often unavoidable; all
survival methods account for it.
Interval censoring (T ∈ (l, r ]) is very frequent, too, but much more
ignored in usual practice.
Left censoring (T ≤ t) is very infrequent.
STiB: Survival Data Analysis 6/ 57
12. Survival Analysis F. Rotolo
Peculiarities of Survival Data
Censoring
Then the most particular feature of survival data is censoring.
Right censoring (T > t) is very frequent and often unavoidable; all
survival methods account for it.
Interval censoring (T ∈ (l, r ]) is very frequent, too, but much more
ignored in usual practice.
Left censoring (T ≤ t) is very infrequent.
Left truncation is a different concept, concerning the selection
bias introduced by including in the study only subjects having a
survival time greater than a certain value, say t ∗ ; then we do not
observe T but T = T |T > t ∗ .
STiB: Survival Data Analysis 6/ 57
13. Survival Analysis F. Rotolo
Peculiarities of Survival Data
Conditioning
The second important feature of survival data is the concept of
conditioning, even more important than censoring according to
some authors (Hougaard, 2000).
STiB: Survival Data Analysis 7/ 57
14. Survival Analysis F. Rotolo
Peculiarities of Survival Data
Conditioning
The second important feature of survival data is the concept of
conditioning, even more important than censoring according to
some authors (Hougaard, 2000).
As time passes, new information is available, not only for subjects
dying, but also for those surviving.
STiB: Survival Data Analysis 7/ 57
15. Survival Analysis F. Rotolo
Peculiarities of Survival Data
Conditioning
The second important feature of survival data is the concept of
conditioning, even more important than censoring according to
some authors (Hougaard, 2000).
As time passes, new information is available, not only for subjects
dying, but also for those surviving.
In this case it is useful to consider, rather than the density f (t) of
T , its hazard function
f (t)
h(t) = ·
1 − F (t)
STiB: Survival Data Analysis 7/ 57
16. Survival Analysis F. Rotolo
Survival Analysis
Notation and Basic Functions
Consider the event time variable T with distribution F (t) and
density f (t) = dF (t)/dt.
The survival function is defined as
S(t) = P(T > t) = 1 − F (t). (1)
STiB: Survival Data Analysis 8/ 57
17. Survival Analysis F. Rotolo
Survival Analysis
Notation and Basic Functions
Consider the event time variable T with distribution F (t) and
density f (t) = dF (t)/dt.
The survival function is defined as
S(t) = P(T > t) = 1 − F (t). (1)
Then, the hazard function is
P(t ≤ T < t + ∆t|T ≥ t) f (t)
h(t) = lim = · (2)
∆t 0 ∆t S(t)
If the censoring time C is independent of the event time T , then h(t) coincides with
the Crude Hazard Function (Fleming & Harrington, 1991, Theorem 1.3.1)
P(t ≤ T < t + ∆t|T ≥ t, C ≥ t)
h# (t) = lim ·
∆t 0 ∆t
STiB: Survival Data Analysis 8/ 57
18. Survival Analysis F. Rotolo
Survival Analysis
Notation and Basic Functions
The cumulative hazard functions is defined as
t
H(t) = h(u)du. (3)
0
STiB: Survival Data Analysis 9/ 57
19. Survival Analysis F. Rotolo
Survival Analysis
Notation and Basic Functions
The cumulative hazard functions is defined as
t
H(t) = h(u)du. (3)
0
Since f (t) = −dS(t)/dt, then
S(t) = e −H(t) (4)
or, equivalently,
d
h(t) = − log{S(t)}.
dt
STiB: Survival Data Analysis 9/ 57
20. Survival Analysis F. Rotolo
Hazard and Conditioning
The hazard function already contains conditioning. Then, it is
particularly advantageous in a survival context, as shown by
Hougaard (1999) in the following table.
In truncated
In full distribution
Quantity distribution given survival to time v
Survival function S(t) S(t)/S(v )
Density f (t) f (t)/S(v )
Hazard function h(t) h(t)
Conditioning corresponds to considering only actually possible
events, accounting for the past being fixed and known.
STiB: Survival Data Analysis 10/ 57
21. Survival Analysis F. Rotolo
Survival Likelihood
Since right censoring is almost unavoidable, the observable
variable is not the time T , but
Y = min(T , C )
(Y , δ), ,
δ = I(T ≤C )
with C ∼ G (·) the censoring time variable and IA the indicator variable on the set A.
STiB: Survival Data Analysis 11/ 57
22. Survival Analysis F. Rotolo
Survival Likelihood
Since right censoring is almost unavoidable, the observable
variable is not the time T , but
Y = min(T , C )
(Y , δ), ,
δ = I(T ≤C )
with C ∼ G (·) the censoring time variable and IA the indicator variable on the set A.
What we are interested in is inference on the survival distribution
and its parameters, the vector ζ.
STiB: Survival Data Analysis 11/ 57
23. Survival Analysis F. Rotolo
Survival Likelihood
Since right censoring is almost unavoidable, the observable
variable is not the time T , but
Y = min(T , C )
(Y , δ), ,
δ = I(T ≤C )
with C ∼ G (·) the censoring time variable and IA the indicator variable on the set A.
What we are interested in is inference on the survival distribution
and its parameters, the vector ζ.
What is the survival likelihood L(ζ; y )?
STiB: Survival Data Analysis 11/ 57
24. Survival Analysis F. Rotolo
Survival Likelihood
The contribution of an event time yi to the likelihood is
T⊥
⊥C
L(ζ; yi ) = (1 − G (yi ))f (yi ) ∝ f (yi ) = h(yi )S(yi ).
The contribution of a right-censor time yi is
T⊥
⊥C
L(ζ; yi ) = g (yi )(1 − F (yi )) ∝ (1 − F (yi )) = S(yi ).
Under i.i.d. sampling of size n with T ⊥ C , the total likelihood is
⊥
n
L(ζ; y ) = {h(yi )}δi S(yi ). (5)
i=1
STiB: Survival Data Analysis 12/ 57
25. Survival Analysis F. Rotolo
Parametric models
A parametric form can be assumed for the hazard function and its
parameters can be estimated via maximization of the likelihood (5).
The most common models are:
Exponential, with constant hazard h(t) = λ > 0
Weibull, with monotone hazard h(t) = λρt ρ−1 , (λ > 0, ρ > 0)
Gompertz, with monotone hazard h(t) = λ exp(γt)
(λ > 0, γ ∈ R) and a fraction (e λ/γ ) of long-term survivors if
γ<0
Piecewise Constant over m intervals with fixed end points
{xq }, and hazard h(t) = m λq I(xq−1<t≤xq )
q=1
STiB: Survival Data Analysis 13/ 57
26. Survival Analysis F. Rotolo
Parametric models
Comparison of parametric models (Hougaard, 2000, Table 2.6)
Property Exponential Weibull Gompertz Piecewise
constant
Increasing hazard possible No Yes Yes Yes
Continuous hazard Yes Yes Yes No
Estimate monotone (Constant) Yes Yes No
Non-zero initial hazard Yes No Yes Yes
Minimum stable Yes Yes No No
Explicit estimation Yes No No Yes
Needs choice of intervals No No No Yes
No. of parameters 1 2 2 m
Dim. of suff.stat.
Complete data 1 n n 2m − 1
Censored data 2 2n 2n 2m
n = number of observations; m + 1 = number of intervals in the piecewise constant model
STiB: Survival Data Analysis 14/ 57
27. Survival Analysis F. Rotolo
Non-Parametric models
Non-parametric methods require no assumption on the form of
survival function.
In general, the most common NP estimator is the empirical
ˆ
distribution function F (t), but censoring prevents its use.
STiB: Survival Data Analysis 15/ 57
28. Survival Analysis F. Rotolo
Non-Parametric models
Non-parametric methods require no assumption on the form of
survival function.
In general, the most common NP estimator is the empirical
ˆ
distribution function F (t), but censoring prevents its use.
Two methods are very widely used:
ˆ
the Kaplan–Meier estimator SKM (t) of the Survival function
ˆ
the Nelson–Aalen estimator HNA (t) of the Cumulative
Hazard
ˆ ˆ
Note that SKM (t) = exp{−HNA (t)}.
STiB: Survival Data Analysis 15/ 57
29. Survival Analysis F. Rotolo
Kaplan–Meier estimator
The Kaplan–Meier Product Limit estimator (Kaplan & Meier, 1958)
of the Survival Function is
ˆ Ni ,
SKM (t) = 1− (6)
Ri
i|ti ≤t
with {ti }i the observed event times, Ni the number of events at time ti and Ri the
number of survivors at time ti .
STiB: Survival Data Analysis 16/ 57
30. Survival Analysis F. Rotolo
Kaplan–Meier estimator
The Kaplan–Meier Product Limit estimator (Kaplan & Meier, 1958)
of the Survival Function is
ˆ Ni ,
SKM (t) = 1− (6)
Ri
i|ti ≤t
with {ti }i the observed event times, Ni the number of events at time ti and Ri the
number of survivors at time ti .
Its variance can be evaluated by the Greenwood’s formula
(Greenwood, 1926; Meier, 1975):
ˆ ˆ Ni
V SKM (t) = [SKM (t)]2 ·
Ri (Ri − Ni )
i|ti ≤t
STiB: Survival Data Analysis 16/ 57
31. Survival Analysis F. Rotolo
Nelson–Aalen estimator
Nelson (1969); Aalen (1976)
The Nelson–Aalen estimator
of the cumulative hazard function is
ˆ Ni ,
HNA (t) = (7)
Ri
i|ti ≤t
with {ti }i the observed event times, Ni the number of events at time ti and Ri the
number of survivors at time ti .
STiB: Survival Data Analysis 17/ 57
32. Survival Analysis F. Rotolo
Nelson–Aalen estimator
Nelson (1969); Aalen (1976)
The Nelson–Aalen estimator
of the cumulative hazard function is
ˆ Ni ,
HNA (t) = (7)
Ri
i|ti ≤t
with {ti }i the observed event times, Ni the number of events at time ti and Ri the
number of survivors at time ti .
Its variance evaluated by the Greenwood’s formula is
ˆ Ni
V HNA (t) = ·
Ri2
i|ti ≤t
STiB: Survival Data Analysis 17/ 57
33. Survival Analysis F. Rotolo
Cox proportional hazards model
The most common and popular model in survival analysis is by far
the Cox Regression Model (Cox, 1972).
STiB: Survival Data Analysis 18/ 57
34. Survival Analysis F. Rotolo
Cox proportional hazards model
The most common and popular model in survival analysis is by far
the Cox Regression Model (Cox, 1972).
For a subject with covariates vector x, the hazard is expressed as
Tβ
h(t; x) = h0 (t)e x , (8)
with β the linear regression parameters vector and h0 (t) the
so-called baseline hazard function, corresponding to the hazard of
a (hypothetical) reference subject with x = (0, . . . 0).
STiB: Survival Data Analysis 18/ 57
35. Survival Analysis F. Rotolo
Cox proportional hazards model
For any two subjects i and j with covariates xi and xj , the hazard
ratio
h(t; xi ) h0 (t) exp(xT β)
i
= = exp{(xi − xj )T β}
h(t; xj ) h0 (t) exp(xT β)
j
is time-constant, so the two hazard functions are proportional.
STiB: Survival Data Analysis 19/ 57
36. Survival Analysis F. Rotolo
Cox proportional hazards model
For any two subjects i and j with covariates xi and xj , the hazard
ratio
h(t; xi ) h0 (t) exp(xT β)
i
= = exp{(xi − xj )T β}
h(t; xj ) h0 (t) exp(xT β)
j
is time-constant, so the two hazard functions are proportional.
The hypothesis of Proportional Hazards (PH) is quite strong !
On the other hand, the regression parameters have a very
straightforward meaning. Indeed, if xi(k) = xj(k) + 1 and
xi(l) = xj(l) , ∀l = k, then
h(t; xi )
β(k) = log ·
h(t; xj )
STiB: Survival Data Analysis 19/ 57
37. Survival Analysis F. Rotolo
Cox proportional hazards model
Semiparametric approach
Under PH assumption, the likelihood (5) is
n
L(β, ξ; y ) = {h0 (yi ) exp(xT β)}δi exp {−H0 (yi ) exp(xT β)} , (9)
i i
i=1
with ξ are the baseline parameters and (β, ξ) corresponding to ζ.
STiB: Survival Data Analysis 20/ 57
38. Survival Analysis F. Rotolo
Cox proportional hazards model
Semiparametric approach
Under PH assumption, the likelihood (5) is
n
L(β, ξ; y ) = {h0 (yi ) exp(xT β)}δi exp {−H0 (yi ) exp(xT β)} , (9)
i i
i=1
with ξ are the baseline parameters and (β, ξ) corresponding to ζ.
If the interest is in the covariates effect, the baseline hazard can
be left unspecified and the likelihood can be profiled (Duchateau &
Janssen, 2008, pg.’s 24–26) reducing to the Partial Likelihood
n
exp (xT β)
i ,
L(β) = (10)
j∈R(yi ) exp(xT β)
j
i=1
where R(t) = {r |yr ≥ t} is the risk set at t.
STiB: Survival Data Analysis 20/ 57
39. Survival Analysis F. Rotolo
Accelerated failure times model
Very less used is the Accelerated Failure Time Model (AFT),
where the covariates act directly on time via a scale factor.
In this case the probability of surviving is
S(t) = S0 (exp(xT β)t).
STiB: Survival Data Analysis 21/ 57
40. Survival Analysis F. Rotolo
Accelerated failure times model
Very less used is the Accelerated Failure Time Model (AFT),
where the covariates act directly on time via a scale factor.
In this case the probability of surviving is
S(t) = S0 (exp(xT β)t).
Consequently the density and the hazard functions are
f (t) = exp(xT β)f0 (exp(xT β)t)
h(t) = exp(xT β)h0 (exp(xT β)t).
STiB: Survival Data Analysis 21/ 57
41. Survival Analysis F. Rotolo
Accelerated failure times model
Very less used is the Accelerated Failure Time Model (AFT),
where the covariates act directly on time via a scale factor.
In this case the probability of surviving is
S(t) = S0 (exp(xT β)t).
Consequently the density and the hazard functions are
f (t) = exp(xT β)f0 (exp(xT β)t)
h(t) = exp(xT β)h0 (exp(xT β)t).
The usual way of representing an AFT model is as loglinear
model of times
log T = xT α + .
STiB: Survival Data Analysis 21/ 57
42. Survival Analysis F. Rotolo
Accelerated failure times model
Very less used is the Accelerated Failure Time Model (AFT),
where the covariates act directly on time via a scale factor.
In this case the probability of surviving is
S(t) = S0 (exp(xT β)t).
Consequently the density and the hazard functions are
f (t) = exp(xT β)f0 (exp(xT β)t)
h(t) = exp(xT β)h0 (exp(xT β)t).
The usual way of representing an AFT model is as loglinear
model of times
log T = xT α + .
In the (only) case of T ∼ Weibull, the model corresponds to a PH
regression.
STiB: Survival Data Analysis 21/ 57
43. Complications of Survival Models F. Rotolo
Outline
Survival Analysis
Complications of Survival Models
Non-proportional hazards
Informative censoring
Dependent observations
Multi-state phenomena
Competing Risks
Incidence
Covariates effect
References
STiB: Survival Data Analysis 22/ 57
44. Complications of Survival Models F. Rotolo
Complications of Survival Models
Most of the methods for Survival Data Analysis rest on some
hypotheses, notably
proportional hazards
uninformative censoring
independent observations
one type of unavoidable event
STiB: Survival Data Analysis 23/ 57
45. Complications of Survival Models F. Rotolo
Complications of Survival Models
Most of the methods for Survival Data Analysis rest on some
hypotheses, notably
proportional hazards
uninformative censoring
independent observations
one type of unavoidable event
How to test for these assumptions?
How to handle data not satisfying these assumptions?
STiB: Survival Data Analysis 23/ 57
46. Complications of Survival Models F. Rotolo
Complications of Survival Models
Most of the methods for Survival Data Analysis rest on some
hypotheses, notably
proportional hazards
uninformative censoring
independent observations
one type of unavoidable event
How to test for these assumptions?
How to handle data not satisfying these assumptions?
STiB: Survival Data Analysis 23/ 57
47. Complications of Survival Models F. Rotolo
Non-proportional hazards
Despite most of the survival methods are based on the cox model,
there might happen that hazards are not proportional.
STiB: Survival Data Analysis 24/ 57
48. Complications of Survival Models F. Rotolo
Non-proportional hazards
Despite most of the survival methods are based on the cox model,
there might happen that hazards are not proportional.
The most simple case to handle is when hazards are proportional in
subgroups, but not globally.
STiB: Survival Data Analysis 24/ 57
49. Complications of Survival Models F. Rotolo
Non-proportional hazards
Despite most of the survival methods are based on the cox model,
there might happen that hazards are not proportional.
The most simple case to handle is when hazards are proportional in
subgroups, but not globally.
Proportional hazards within subgroups (Collett, 2003, pg. 316)
STiB: Survival Data Analysis 24/ 57
50. Complications of Survival Models F. Rotolo
Non-proportional hazards
The effect of the treatment in the whole population is not
multiplicative, despite it is so within each centre.
STiB: Survival Data Analysis 25/ 57
51. Complications of Survival Models F. Rotolo
Non-proportional hazards
The effect of the treatment in the whole population is not
multiplicative, despite it is so within each centre.
What can be done is to use a stratified PH model
hij (t) = h0j (t) exp(xT β),
ij
where the hazard of patient i from center j is exp(xT β) times the
ij
baseline h0j (t) of the stratum (center) at each time point.
STiB: Survival Data Analysis 25/ 57
52. Complications of Survival Models F. Rotolo
Non-proportional hazards
The effect of the treatment in the whole population is not
multiplicative, despite it is so within each centre.
What can be done is to use a stratified PH model
hij (t) = h0j (t) exp(xT β),
ij
where the hazard of patient i from center j is exp(xT β) times the
ij
baseline h0j (t) of the stratum (center) at each time point.
Since different baselines are taken into account, the covariates
effect is multiplicative and it can be estimated thanks to usual
methods for PH cox models.
STiB: Survival Data Analysis 25/ 57
53. Complications of Survival Models F. Rotolo
Non-proportional hazards
A more complex situation is when there are non-proportional
hazards between levels of a dichotomous variable.
Non-proportional hazards (Collett, 2003, pg. 317)
STiB: Survival Data Analysis 26/ 57
54. Complications of Survival Models F. Rotolo
Non-proportional hazards
A more complex situation is when there are non-proportional
hazards between levels of a dichotomous variable.
Non-proportional hazards modelled as PH (Collett, 2003, pg. 317)
STiB: Survival Data Analysis 26/ 57
55. Complications of Survival Models F. Rotolo
Non-proportional hazards
Hazards can be modelled as proportional in a series of k
consecutive time intervals, obtaining the piecewise PH model
k
hi (t) = h0 (t) exp xi β1 + βj zj (t) ,
j=2
where xi is 0 for standard treatment and 1 for new treatment and the
zj (t)’s are (time-varying) indicators for being in the j th interval.
STiB: Survival Data Analysis 27/ 57
56. Complications of Survival Models F. Rotolo
Non-proportional hazards
Hazards can be modelled as proportional in a series of k
consecutive time intervals, obtaining the piecewise PH model
k
hi (t) = h0 (t) exp xi β1 + βj zj (t) ,
j=2
where xi is 0 for standard treatment and 1 for new treatment and the
zj (t)’s are (time-varying) indicators for being in the j th interval.
Log-hazard ratio for treatments is now different in each interval:
β1 for interval 1
β1 + βk for interval k > 1.
STiB: Survival Data Analysis 27/ 57
57. Complications of Survival Models F. Rotolo
Non-proportional hazards
Hazards can be modelled as proportional in a series of k
consecutive time intervals, obtaining the piecewise PH model
hi (t) = h0 (t) exp xi β1 + βj zj (t) ,
where xi is 0 for standard treatment and 1 for new treatment and the
zj (t)’s are (time-varying) indicators for being in the j th interval.
Log-hazard ratio for treatments is now different in each interval:
β1 for interval 1
β1 + βk for interval k > 1.
Testing PH assumption: if all βk ’s are not significantly different
from 0 then there is no evidence of non-PH.
STiB: Survival Data Analysis 27/ 57
58. Complications of Survival Models F. Rotolo
Complications of Survival Models
Most of the methods for Survival Data Analysis rest on some
hypotheses, notably
proportional hazards
uninformative censoring
independent observations
one type of unavoidable event
How to test for these assumptions?
How to handle data not satisfying these assumptions?
STiB: Survival Data Analysis 28/ 57
59. Complications of Survival Models F. Rotolo
Informative censoring
Most of the survival analysis methods are only valid under
independent censoring hypothesis:
Ci ⊥ Ti .
⊥
STiB: Survival Data Analysis 29/ 57
60. Complications of Survival Models F. Rotolo
Informative censoring
Most of the survival analysis methods are only valid under
independent censoring hypothesis:
Ci ⊥ Ti .
⊥
For censoring due to end of the study, independence is reasonable.
For censoring due to loss to follow-up or competing risk it is much
more questionable.
STiB: Survival Data Analysis 29/ 57
61. Complications of Survival Models F. Rotolo
Informative censoring
Two typical situations (Putter et al., 2007):
Healthy participants feel less need for medical services offered
by the study, and therefore quit.
→ C is negatively correlated with T
→ Overestimation of event risk
STiB: Survival Data Analysis 30/ 57
62. Complications of Survival Models F. Rotolo
Informative censoring
Two typical situations (Putter et al., 2007):
Healthy participants feel less need for medical services offered
by the study, and therefore quit.
→ C is negatively correlated with T
→ Overestimation of event risk
Persons with advanced disease progression have become too
ill for further follow-up or they return to their country to
spend the last period with their family.
→ C is positively correlated with T
→ Underestimation of event risk
STiB: Survival Data Analysis 30/ 57
63. Complications of Survival Models F. Rotolo
Informative censoring
Empirical evaluation
An empirical way to check the uninformative censoring assumption
is to plot observed survival times against each regressor,
distinguishing censored and event times.
STiB: Survival Data Analysis 31/ 57
64. Complications of Survival Models F. Rotolo
Informative censoring
Empirical evaluation
An empirical way to check the uninformative censoring assumption
is to plot observed survival times against each regressor,
distinguishing censored and event times.
(a) (b)
+ +
+ + + + + + + +
++ +
50
50
+
+ +
+ + +
Time
Time
+
30
30
+ + + +
+ q + + + q
+ + +q q q
q
++ +
q
q q q
10
q
10 q
+ ++ + ++ ++ + ++ + + + +
+ + +
q q
q q q q
q + q
q
q
q + q
q q
q q
40 50 60 70 80 40 50 60 70 80
Age at diagnosis Age at diagnosis
o = censored; + = event
Example of data not suggesting (a) and suggesting (b) informative censoring
STiB: Survival Data Analysis 31/ 57
65. Complications of Survival Models F. Rotolo
Informative censoring
Bounding unobserved event times
A more formal way to investigate sensibleness of the independent
censoring hypothesis is a sort of robustness study, comparing
conclusions from two extreme situations, where censored times
are treated as event times
with the same time value of censoring time
with the largest event time in the data set
STiB: Survival Data Analysis 32/ 57
66. Complications of Survival Models F. Rotolo
Informative censoring
Bounding unobserved event times
A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study,
comparing conclusions from two extreme situations, where censored times are treated as event times
with the same time value of censoring time
with the largest event time in the data set
o
o
+
o
40
o
+
o
o
o
o
+
o
+
o
30
o
o
o
o
o
o
+
o
+
+
20
o
o
o
+
o
+
+
+
o
o
10
o
o
+
o
o
+
o
o
+
o
0
0 10 20 30 40 50 60
Time
o = censored; + = event
STiB: Survival Data Analysis 32/ 57
67. Complications of Survival Models F. Rotolo
Informative censoring
Bounding unobserved event times
A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study,
comparing conclusions from two extreme situations, where censored times are treated as event times
with the same time value of censoring time
with the largest event time in the data set
+
o
+
o
+
+ +
o
40
o
+
++
o
o
+
o +
o
+
+ +
o
+ +
o
30
o
+
o
+
o
+
o
++
o
o
+
+ +
o
+
+
20
o
+
o
+
o
+
+ +
o
+
+
+
o
++
o
10
o
+
o
+
+
o
+
o
+
+
o
+
o
+
+
o
0
0 10 20 30 40 50 60
Time
o = censored; + = event
STiB: Survival Data Analysis 32/ 57
68. Complications of Survival Models F. Rotolo
Informative censoring
Bounding unobserved event times
A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study,
comparing conclusions from two extreme situations, where censored times are treated as event times
with the same time value of censoring time
with the largest event time in the data set
o ++o
+
+ +
o
40
o
+
++
o
o
+
o
+
+
o
+
+
o
+ +
o
30
o
o
o
+
+
o
o
+
+
o +
+
+
+
o
+
+
20
o
o
o
+
+
+
+
o +
+
+
+
o
++o
10
o
o
+
+
o + +
o
+
o
o
+
+
+
o +
0
0 10 20 30 40 50 60
Time
o = censored; + = event
STiB: Survival Data Analysis 32/ 57
69. Complications of Survival Models F. Rotolo
Informative censoring
Bounding unobserved event times
A more formal way to investigate sensibleness of the independent
censoring hypothesis is a sort of robustness study, comparing
conclusions from two extreme situations, where censored times
are treated as event times
with the same time value of censoring time
with the largest event time in the data set
o ++o
+
+ +
o
40
o
+
++
o
o
+
o
+
+
o
+
+
o
+ +
o
30
o
o
o
+
+
o
o
+
+
o +
+
+
+
o
+
+
20
o
o
o
+
+
+
+
o +
+
+
+
o
++o
10
o
o
+
+
o + +
o
+
o
o
+
+
+
o +
0
0 10 20 30 40 50 60
Time
If essentially the same conclusions can be drawn from the
original and these two models, then the censoring times can be
safely treated as independent of the event times.
STiB: Survival Data Analysis 32/ 57
70. Complications of Survival Models F. Rotolo
Informative censoring
Logistic regression
The most formal way of testing independent censoring hypothesis
is to use a linear logistic model with censoring variable as
response.
STiB: Survival Data Analysis 33/ 57
71. Complications of Survival Models F. Rotolo
Informative censoring
Logistic regression
The most formal way of testing independent censoring hypothesis
is to use a linear logistic model with censoring variable as
response.
If any covariate results significant in predicting whether the
event time is observed or censored, then the independence
hypothesis is quite unlikely.
STiB: Survival Data Analysis 33/ 57
72. Complications of Survival Models F. Rotolo
Informative censoring
Logistic regression
The most formal way of testing independent censoring hypothesis
is to use a linear logistic model with censoring variable as
response.
If any covariate results significant in predicting whether the
event time is observed or censored, then the independence
hypothesis is quite unlikely.
What to do?
STiB: Survival Data Analysis 33/ 57
73. Complications of Survival Models F. Rotolo
Informative censoring
Solutions are quite limited and no satisfactory way to overcome
the problem exists.
STiB: Survival Data Analysis 34/ 57
74. Complications of Survival Models F. Rotolo
Informative censoring
Solutions are quite limited and no satisfactory way to overcome
the problem exists.
Censoring all data before the first censored observation makes
the censoring really independent of event times, but it is little
useful if this occurs early.
o
o
+
o
40
o
+
o
o
o
o
+
o
+
o
30
o
o
o
q
o
o
o
+
o
+
+
20
o
o
o
+
o
+
+
+
o
o
10
o
o
+
o
o
+
o
o
+
o
0
0 10 20 30 40 50 60
Time
o = censored; + = event
STiB: Survival Data Analysis 34/ 57
75. Complications of Survival Models F. Rotolo
Informative censoring
Solutions are quite limited and no satisfactory way to overcome
the problem exists.
Censoring all data before the first censored observation makes
the censoring really independent of event times, but it is little
useful if this occurs early.
o o
o o
o +
o o
40
o o
o +
o o
o o
o o
o o
+
o o
+
o o
30
o o
o o
o o
q
o
o o
o o
o +
o o
o +
o +
20
o o
o o
o o
o +
o o
o +
o +
o +
o o
o o
10
o o
o o
o +
o o
o o
+
o o
o o
o +
o o
0
0 10 20 30 40 50 60
Time
o = censored; + = event
STiB: Survival Data Analysis 34/ 57
76. Complications of Survival Models F. Rotolo
Complications of Survival Models
Most of the methods for Survival Data Analysis rest on some
hypotheses, notably
proportional hazards
uninformative censoring
independent observations
one type of unavoidable event
How to test for these assumptions?
How to handle data not satisfying these assumptions?
STiB: Survival Data Analysis 35/ 57
77. Complications of Survival Models F. Rotolo
Dependent observations
Cox models and most of the survival analysis models assume that,
conditionally on possible regressors, event times are i.i.d.
STiB: Survival Data Analysis 36/ 57
78. Complications of Survival Models F. Rotolo
Dependent observations
Cox models and most of the survival analysis models assume that,
conditionally on possible regressors, event times are i.i.d.
This is an unreasonable assumption in many situations:
multi-centre studies
repeated measures on the same subject
inclusion of relatives in the same study
measures on similar organs from the same organism
paired samples
...
STiB: Survival Data Analysis 36/ 57
79. Complications of Survival Models F. Rotolo
Dependent observations
Cox models and most of the survival analysis models assume that,
conditionally on possible regressors, event times are i.i.d.
This is an unreasonable assumption in many situations:
multi-centre studies
repeated measures on the same subject
inclusion of relatives in the same study
measures on similar organs from the same organism
paired samples
...
If the group effect is of interest, the factor is inserted in the model
as usual. More often one is only interested in controlling its
effect in a parsimonious way in term of parameters.
STiB: Survival Data Analysis 36/ 57
80. Complications of Survival Models F. Rotolo
Dependent observations
The most common way to account for clustering in hazard
regression models is in a mixed model form (McCullagh & Nelder, 1989)
through a random effect.
2
log{hij (t)} = log{h0 (t)} + wj + xT β,
ij wj ∼ IID(0, σw ).
STiB: Survival Data Analysis 37/ 57
81. Complications of Survival Models F. Rotolo
Dependent observations
The most common way to account for clustering in hazard
regression models is in a mixed model form (McCullagh & Nelder, 1989)
through a random effect.
2
log{hij (t)} = log{h0 (t)} + wj + xT β,
ij wj ∼ IID(0, σw ).
The random effect wj is unobservable and common to all
elements of a cluster.
STiB: Survival Data Analysis 37/ 57
82. Complications of Survival Models F. Rotolo
Dependent observations
The most common way to account for clustering in hazard
regression models is in a mixed model form (McCullagh & Nelder, 1989)
through a random effect.
2
log{hij (t)} = log{h0 (t)} + wj + xT β,
ij wj ∼ IID(0, σw ).
The random effect wj is unobservable and common to all
elements of a cluster.
Its actual realizations are not that important; on the contrary its
distribution is of primary interest to eliminate the variability
introduced by it.
STiB: Survival Data Analysis 37/ 57
83. Complications of Survival Models F. Rotolo
Dependent observations
In survival analysis, the model is usually expressed in the form
2
hij (t) = h0 (t)zj exp{xT β},
ij zj ∼ IID(1, σz ). (11)
with zj = e wj > 0 and is called Frailty Model (Duchateau & Janssen, 2008;
Wienke, 2009).
STiB: Survival Data Analysis 38/ 57
84. Complications of Survival Models F. Rotolo
Dependent observations
In survival analysis, the model is usually expressed in the form
2
hij (t) = h0 (t)zj exp{xT β},
ij zj ∼ IID(1, σz ). (11)
with zj = e wj > 0 and is called Frailty Model (Duchateau & Janssen, 2008;
Wienke, 2009).
The random variable zj was named frailty (term) by Vaupel et al.
(1979) as long as subjects with larger values have an increased
hazard, then they are more likely to die sooner.
STiB: Survival Data Analysis 38/ 57
85. Complications of Survival Models F. Rotolo
Dependent observations
In survival analysis, the model is usually expressed in the form
2
hij (t) = h0 (t)zj exp{xT β},
ij zj ∼ IID(1, σz ). (11)
with zj = e wj > 0 and is called Frailty Model (Duchateau & Janssen, 2008;
Wienke, 2009).
The random variable zj was named frailty (term) by Vaupel et al.
(1979) as long as subjects with larger values have an increased
hazard, then they are more likely to die sooner.
Note that the frailty is time-constant, so the hazard is increased or
decreased at any time.
STiB: Survival Data Analysis 38/ 57
86. Complications of Survival Models F. Rotolo
Dependent observations
The main consequences of this approach are two:
Dependence between event times in the same cluster
Thanks to that Frailty Models can account for dependency!
Non-proportionality of hazards in general
Hazards are still proportional conditionally on frailty values
STiB: Survival Data Analysis 39/ 57
87. Complications of Survival Models F. Rotolo
Dependent observations
The main consequences of this approach are two:
Dependence between event times in the same cluster
Thanks to that Frailty Models can account for dependency!
Non-proportionality of hazards in general
Hazards are still proportional conditionally on frailty values
Clusters can also have dimension 1, in which case all methods are
unchanged but their meaning and interpretation are quite
different. (Univariate frailty models for overdispersion: Wienke, 2009, Chp. 3)
STiB: Survival Data Analysis 39/ 57
88. Complications of Survival Models F. Rotolo
Dependent observations
Many distributions can be used to model the frailty term; the most
common (Duchateau & Janssen, 2008, Chp. 4) are
Gamma, mathematically the most convenient: analytical
integration, closed under truncation
Log-Normal, the most consistent with the GLMM theory:
the random effects wj are Normal
Inverse-Gaussian, analytical integration
Positive-Stable, analytical integration and very flexible:
extends Gamma, Inverse-Gaussian, Positive-Stable and
compound-Poisson
Power-Variance-Function, very flexible: extends Gamma,
Inverse-Gaussian, Positive-Stable and compound-Poisson.
Closed under truncation
STiB: Survival Data Analysis 40/ 57
89. Complications of Survival Models F. Rotolo
Dependent observations
When a parametric model for the baseline hazard is assumed,
then the likelihood (9) can be used.
As long as the frailties are not known, the marginal likelihood is
considered:
STiB: Survival Data Analysis 41/ 57
90. Complications of Survival Models F. Rotolo
Dependent observations
When a parametric model for the baseline hazard is assumed,
then the likelihood (9) can be used.
As long as the frailties are not known, the marginal likelihood is
considered:
s ∞ nj
Lmarg = hij (tij )δij S(tij )f (zj )dzj (12)
j=1 0 i=1
with s the number of clusters, nj the number of subjects in cluster j, hij (·) defined as
in (11) and f (·) the density of zj .
STiB: Survival Data Analysis 41/ 57