5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Confounding, politics, frustration and knavish tricks
1. Bradford Hill lecture 2008 1 of 48
Confounding, politics,
frustration and knavish tricks
Stephen Senn
2. Bradford Hill lecture 2008 2 of 48
"If when the tide is falling you take out
water with a twopenny pail, you and the
moon together can do a great deal”
Bradford Hill, A., and Hill, I. D. (1990), (12th
edition) Principles of Medical
Statistics. p247
3. Bradford Hill lecture 2008 3 of 48
The Central Problem of
Epidemiology
• This is generally recognised to be confounding
• Where experiments cannot be conducted we must make
do with observational studies
• There is also the risk that due to hidden confounders we
will conclude causation when all we have is association
• Hill was a (the) key figure in promoting randomised
controlled trials (RCTs)
• But he also recognised that RCTs were not enough and
was a pioneer of observational studies
– Case control as in Doll and Hill (1950)
– Cohort as in Doll and Hill (1954)
4. Bradford Hill lecture 2008 4 of 48
Outline
• Some statistics of the propensity score
• An explanation of the propensity score
• Comparison to ANCOVA
• Some criticisms
• Conclusions
Acknowledgement
This is based on joint work with Erika Graf and Angelika Caputo
Senn, S., Graf, E., and Caputo, A. (2007), "Stratification for the Propensity Score
Compared with Linear Regression Techniques to Assess the Effect of Treatment or
Exposure," Statistics in Medicine, 26, 5529-5544.
5. Bradford Hill lecture 2008 5 of 48
A Question for you to Consider
• Consider these two experiments
– A completely randomised trial
– Patients allocated with 50% probability to A or B
– Randomised matched pairs
– Member of any pair randomised with 50% probability to A
or B
• In analysing, would you ignore the
matching in the second case?
6. Bradford Hill lecture 2008 6 of 48
Propensity score: background
• Due to Rosenbaum and Rubin, Biometrika
1983
• Has been cited over 1000 times since first
published
• Citation rate has grown rapidly since 1995
and is now more than 200 per year
7. Bradford Hill lecture 2008 7 of 48
This model
predicting more
than 300
citations in
2008
Annual citations of RosenbaumandRubin
50
150
0
100
1990 2000 2005
200
1995
250
1985
Year
Citations
Fit
Data
8. Bradford Hill lecture 2008 8 of 48
Cumulativecitations of RosenbaumandRubin
0
1985
1000
600
200
2005
1200
400
800
200019951990
Year
Cumulative
9. Bradford Hill lecture 2008 9 of 48
MEDICINE, GENERAL & INTERNAL (5.67%)
ECONOMICS (19.45%)
MATHEMATICAL & COMPUTATIONAL BIOLOGY (6.63%)
CARDIAC & CARDIOVASCULAR SYSTEMS (10.69%)
SOCIAL SCIENCES, MATHEMATICAL METHODS (8.99%)
PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH (12.75%)
SURGERY (6.48%)
HEALTH CARE SCIENCES & SERVICES (6.34%)
RESPIRATORY SYSTEM (5.75%)
STATISTICS & PROBABILITY (17.24%)
10. Bradford Hill lecture 2008 10 of 48
Propensity Score Explanation
• We consider two ‘treatments’ or exposures
a subject might have received
• The assignment indicator is X
– X = 0, if subject receives exposure 0
– X = 1, if subject receives exposure 1
• There is a vector of covariates W
11. Bradford Hill lecture 2008 11 of 48
Counterfactual responses
• For every subject we have two responses
– ro
– r1
• One of these will be observed
• One of these is unobserved
– Counterfactual
12. Bradford Hill lecture 2008 12 of 48
Propensity score: definition
( ) ( )1e W P X W= =
This is a form of balancing score b(W). A balancing score is
defined as follows. If r0 is the response given by a subject that is
unexposed (indexed by 0) and r1 is the response when the same
subject is exposed (indexed by 1), and
( ) ( ) ( )0 1 0 1, ,r r X W and r r X b W⊥ ⊥
then b(W) is a balancing score. R & R show that the finest such
score is W itself and the coarsest is the propensity score
13. Bradford Hill lecture 2008 13 of 48
Propensity score uses
• Calculate the propensity score for each
subject
• Stratify by the propensity score
– In practice fifths are used
• The resulting estimator is unbiased
– The possible confounding influence of W has
been eliminated
14. Bradford Hill lecture 2008 14 of 48
Exposure
A B Total
Young 240 80 320Male
Old 60 20 80
Young 80 240 320Female
Old 20 60 80
Total 400 400 800
Propensity Score:An Example
Disposition of subjects in a study
15. Bradford Hill lecture 2008 15 of 48
Exposure
Sex A B Total
Male 300 100 400
Female 100 300 400
400 400 800
Exposure
Age A B Total
Young 320 320 640
Old 80 80 160
400 400 800
Sex is predictive of exposure but age is not
16. Bradford Hill lecture 2008 16 of 48
Class Relative
frequency (A)
‘Probability’ of
Disposition (to A)
Young males 240/320 3/4
Old males 60/80 3/4
Young females 80/320 1/4
Old females 20/80 1/4
The philosophy of the propensity score is to stratify by probability
of allocation. In this case this is equivalent to stratifying by sex.
17. Bradford Hill lecture 2008 17 of 48
Treatment
Sex A B Difference
Male 96 136 40
Female 96 136 40
Treatment
Age A B Difference
Young 100 140 40
Old 80 120 40
Response
Age is predictive of outcome but sex is not
18. Bradford Hill lecture 2008 18 of 48
The Difference to Conventional
Approaches
• Conventional approaches correct for covariates
if they are predictive of outcome
– Analysis of covariance
– Stratification
• The propensity score corrects if covariates are
predictive of assignment (allocation)
• In this example correcting either for sex
(propensity score) or age (ANCOVA) will
produce an “unbiased” estimate
19. Bradford Hill lecture 2008 19 of 48
In terms of linear regression
UVβLet be the marginal regression of U on V
be the conditional regression of U on V given TTUV .Let β
)2(0
or
)1(0
if
.
.
..
=
=
=∴
+=
WX
XYW
WYXYX
WXXYWWYXYX
β
β
ββ
ββββ
(1) Is the analysis of covariance condition for not including
something in the model and (2) is the propensity score condition.
To define some general notation
Now consider a specific
implementation where Y is
outcome X is treatment and
W is covariate
20. Bradford Hill lecture 2008 20 of 48
Some myths of the propensity
score
• Colinearity of predictors makes traditional
regression adjustments unusable
• Quintile stratification on the propensity
score eliminates bias more effectively than
ANCOVA
• The propensity score can be more efficient
than ANCOVA
• The coarsening property of the propensity
score benefits efficiency
21. Bradford Hill lecture 2008 21 of 48
Colinearity of Predictors
Consider a simple example in which the following predictor pattern is
repeated a number of times
Covariate/Confounder Exposure
W1 W2 X
0 0 0
0 0 1
1 1 0
1 1 1
Clearly the effects of W1 and W2 are not identifiable but the effect of X is and
any decent statistical package should be able to estimate the effect even if
W1 and W2 are in the model. In the following example it is supposed that
( )
1 2
0,1
Y W W X
N
ε
ε
= + + +
:
And that we have the same basic pattern
of predictors for 1000 observations
22. Bradford Hill lecture 2008 22 of 48
Analysis with GenStat 1
Case where W1 and W2 are completely colinear
Message: term W2 cannot be included in the model because it is aliased
with terms already in the model.
(W2) = (W1)
Regression analysis
Estimates of parameters
Parameter estimate s.e. t(997) t pr.
Constant -0.0266 0.0542 -0.49 0.624
W1 2.0067 0.0626 32.05 <.001
X 1.0377 0.0626 16.57 <.001
23. Bradford Hill lecture 2008 23 of 48
Analysis with GenStat 2
Case where W1 and W2 are strongly colinear ( a small bit of noise added to W2)
Regression analysis
Estimates of parameters
Parameter estimate s.e. t(996) t pr.
Constant -0.0270 0.0542 -0.50 0.619
W1 -0.82 3.16 -0.26 0.795
W2e 2.83 3.16 0.89 0.372
X 1.0372 0.0626 16.56 <.001
Message: the variance of some parameter estimates is seriously inflated, due to near
collinearity or aliasing between the following parameters, listed with their variance
inflation factors.
W1 2553.00
W2e 2553.00
24. Bradford Hill lecture 2008 24 of 48
Better at eliminating bias?
• Some papers have purported to show this
• Claims have been demonstrated using
simulation
• But the simulations have been unfair
– For example using models of different implicit
complexity
• It is trivial to produce examples where quintile
stratification does not work
– Suppose a baseline covariate differs by one standard
deviation between exposures and outcome is a linear
function of this
• ANCOVA works perfectly, propensity score is biased
25. Bradford Hill lecture 2008 25 of 48
More efficient than ANCOVA ?
• Stratification by probability of assignment
• But ANCOVA stratifies by predictors of
outcome; not assignment.
• By definition residual variance less for
ANCOVA
• By definition, loss of orthogonality greater for
propensity.
• Consequence: variance of estimators higher
for propensity score
• Propensity score incoherent?
26. Bradford Hill lecture 2008 26 of 48
Furthermore
• The coarseness property of the propensity
score is completely irrelevant
• There is no gain in efficiency through this
property
• The loss in orthogonality is equivalent to
fitting all covariates and their interactions
with each other.
• You might as well just use (multivariate)
W.
27. Bradford Hill lecture 2008 27 of 48
A Regression Reminder
[ ]
( ) ( )
1 2
00
2
ˆvar
XX
Let P X W
β P P
a
a
σ
σ
−
=
′=
÷
÷=
÷
÷
L L M
M
M O
M O
The propensity score philosophy chooses the members of W
in such a way that axx is maximised. Analysis of covariance
chooses the members so that σ2
is minimised.
28. Bradford Hill lecture 2008 28 of 48
Another Example
Young Old Total all ages Total
X = 0 X = 1 X = 0 X = 1 X = 0 X = 1
Male 3 7 80 30 83 37 120
Female 8 42 9 21 17 63 80
Total
both
11 49 89 51 100 100
Total 60 140 200
29. Bradford Hill lecture 2008 29 of 48
Another Example
Young Old Total all ages Total
X = 0 X = 1 X = 0 X= 1 X= 0 X = 1
Male 3 7 80 30 83 37 120
Female 8 42 9 21 17 63 80
Total
both
11 49 89 51 100 100
Total 60 140 200
e(w) = 0.7
30. Bradford Hill lecture 2008 30 of 48
Propensity score stratification
Exposure Assignment Total
Stratum or
strata
Propensity
score
X = 0 X= 1
Old males e(W) = 0.27 80 30 110
Young males +
Old females
e(W) = 0.70 12 28 40
Young females e(W) = 0.84 8 42 50
Total 100 100 200
31. Bradford Hill lecture 2008 31 of 48
The last of these is the same as for the propensity score
For our Second Example
Factors in Model in
Addition to Exposure
Variance Multiplier, axx.
None 0.0200
Age 0.0242
Sex 0.0257
sex + age 0.0267
sex + age + sex × age 0.0271
32. Bradford Hill lecture 2008 32 of 48
Conditional Distributions and
The Propensity Score
• The appropriateness of the propensity
score is always illustrated in terms of the
expectation of the treatment estimate
– Unbiasedness in linear framework
• Its suitability when looked at in terms of
the full conditional distribution less obvious
as will now be demonstrated
33. Bradford Hill lecture 2008 33 of 48
Suppose that we are interested in the conditional distribution of an
outcome variable Y given a putative causal variable X and a further
covariate W. We wish to investigate the circumstances under
which W can be ignored. That is to say we wish to know the
conditions that
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
( ) ( )
.
,,1)4(
)3(
)2(
)1(
XWYXYW
XWfYXWf
XWf
YXWf
XYfXWYf
XWf
YXWf
XYfXWYf
YXWfXYfXfXWYfXWfXf
YXWfXYfXfYXWf
XWYfXWfXfYXWf
⊥⊥
=∩∴=
∩
=∩
∩
=∩∴
∩=∩
∩=∩∩
∩=∩∩
henceandimplieswhich
andunless
toequivalentgeneralinnotis(3)Now,
(2)and(1)ofRHSEquating
and
Now,
L
L
L
L
( ) ( )f Y W X f Y X∩ =
34. Bradford Hill lecture 2008 34 of 48
Conclusion
• The claims that are made for the
propensity score are true in terms of
conditional expectation (at least for the
linear model)
• However, they are not true in terms of the
full conditional model
• For W to be ignorable in that sense
requires
• This is the ANCOVA condition
XWY ⊥
35. Bradford Hill lecture 2008 35 of 48
Implications for Modelling
• It is not true that ignoring a covariate that
is predictive of outcome but not
assignment is acceptable
• In the linear case estimators are unbiased
but their variances are “incorrect”
• More generally, however, conditional and
unconditional estimators are different
– Logistic regression, survival analysis
36. Bradford Hill lecture 2008 36 of 48
Y
Z
X4
X2
X3
X1
X5
X6
What should join Z
in the model?
37. Bradford Hill lecture 2008 37 of 48
Y
Z
X4
X2
X3
X1
X5
X6
With inappropriate
terms removed
38. Bradford Hill lecture 2008 38 of 48
Y
Z
X4
X2
X3
X1
X5
X6
Propensity score
adjustment
40. Bradford Hill lecture 2008 40 of 48
Non-linear example
Simulation as before but binary response on Y >1.5
With balanced covariates
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant -2.442 0.185 -13.18 <.001 0.08696
W1 4.98 8.51 0.59 0.558 146.2
W2e -1.73 8.51 -0.20 0.839 0.1768
X 1.689 0.192 8.78 <.001 5.413
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant -0.4642 0.0918 -5.06 <.001 0.6287
X 0.962 0.130 7.40 <.001 2.617
41. Bradford Hill lecture 2008 41 of 48
Not convinced?
An Example
• An open trial of the effect of alcohol
consumption on the ability to memorize
word lists
• Volunteers to be drawn at random and
divided into two groups
• One lot to be given a glass of wine, the
other a glass of water
42. Bradford Hill lecture 2008 42 of 48
Two Possible Approaches
Experiment 1 Experiment 2
• A subject has name
drawn at random
• If chosen for control
group, given blue ball
• If chosen for treatment
group given red ball
• “All you who have a blue
ball please come to
receive your glass of
water, red ball to receive
your glass of wine”
• A subject has name
drawn at random
• If chosen for control
group given glass of beer
to drink
• Otherwise given nothing
• “All you who have had a
beer come to receive
your glass of water, if you
had nothing, to receive
your glass of wine.”
43. Bradford Hill lecture 2008 43 of 48
Experiment 1
• Probability of receiving wine if ball blue = 0
• Probability of receiving wine if ball red = 1
• The propensity score takes on the values
0 and 1
• Do you have to stratify by the propensity
score?
44. Bradford Hill lecture 2008 44 of 48
Experiment 2
• Probability of receiving wine if beer = 0
• Probability of receiving wine if no beer = 1
• The propensity score takes on the values
0 and 1
• Do you have to stratify by the propensity
score?
45. Bradford Hill lecture 2008 45 of 48
The Difference?
• The difference between these two
experiments is not the propensity score
• This is 0 and 1 in both cases and all
subjects in both cases have a score of 0
and 1
• The difference is that in the first case the
covariate used to construct the score is
predictive of outcome and in the second it
is not.
46. Bradford Hill lecture 2008 46 of 48
Consequence
• It is association with outcome that is
important
– ANCOVA tradition
• Not association with assignment
– Propensity point of view
47. Bradford Hill lecture 2008 47 of 48
And that Question
• Consider these two experiments
– A completely randomised trial
– Patients allocated with 50% probability to A or B
– Randomised matched pairs
– Member of any pair randomised with 50% probability to A
or B
• In analysing, would you ignore the
matching in the second case?
• The propensity score philosophy says you
can!
48. Bradford Hill lecture 2008 48 of 48
Finally
All scientific work is incomplete - whether it be observational or experimental.
All scientific work is liable to be upset or modified by advancing knowledge.
That does not confer upon us a freedom to ignore the knowledge we already
have, or to postpone the action that it appears to demand at a given time.
Sir Austin Bradford Hill , 1965
Notas do Editor
Lecture given at the London School of Hygiene and Tropical Medicine 3 June 2008
The dangers of concluding that subsequence is consequence
&quot;Doll R, Hill AB. (1950) Smoking and carcinoma of the lung. Preliminary report, British Medical Journal, 2: 739-748.
&quot;Doll R, Hill AB. (1954) The mortality of doctors in relation to their smoking habits. British Medical Journal, 228:1451-5.
It was hearing Erika Graf lecture on this in the mid 1990s that first got me interested in this topic.
See Graf, E. (1997). &quot;The propensity score in the analysis of therapeutic studies.&quot; Biometrical Journal 39: 297-307.
&lt;number&gt;
This is an elementary issue in applied statistics that we teach students to understand
ECONOMICS264
STATISTICS & PROBABILITY234
PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH173
CARDIAC & CARDIOVASCULAR SYSTEMS145
SOCIAL SCIENCES, MATHEMATICAL METHODS122
MATHEMATICAL & COMPUTATIONAL BIOLOGY90
SURGERY88
HEALTH CARE SCIENCES & SERVICES86
RESPIRATORY SYSTEM78
MEDICINE, GENERAL & INTERNAL77
A given subject receives at most one.
One of responses is realised the other is not
The first of these conditions involving r0 and r1 is the assumption of no unmeasured confounders
“This means that the counterfactual responses and treatment assignment are conditionally independent given the vector of covariates.” (Senn, Graf and Caputo)
The second is a condition for some function of the covariates to be ‘enough’ to stratify on.
Stratification by fifths is often referred to as quintile stratification
&lt;number&gt;
Example to illustrate the propensity score.
We have two exposures (A and B) and two explanatory factors age (old and young) and sex (male and female).
The outcome is not related to sex but is definitely related to age.
&lt;number&gt;
The point about these marginal tables is that they show that the treatment groups are imbalanced by sex but are not imbalanced by age. The philosophy of the propensity score is to stratify by them by probability of allocation to one of the treatment groups (say group A). This is, in fact, equivalent to stratifying by sex, since this is the factor that affects the probability of allocation.
In the example here, relative frequencies rather than probabilities are used. In fact the propensity score is defined in terms of the latter and this can be seen as a weakness. The distinction is ignored here.
&lt;number&gt;
We can define the strata by the probability of exposure (the propensity score). In this example, this is equivalent to stratifying by sex.
&lt;number&gt;
This on the other hand shows a more relevant stratification from the point of view of tradintional ANCOVA
&lt;number&gt;
When looked at in terms of variance the propensity score appears in a less satisfactory light.
These two groups have the same propensity score of 3/10=9/30=0.3.
In fact although we can classify subjects by four covariate combinations, there are only three strata in the propensity score. The score is coarsened.
In other words the propensity score has gained nothing in terms of efficiency compared to fitting the full model.
An indicator of exposure, Z, an outcome variable Y and some potential confounders, X1-X6
An indicator of exposure, Z, an outcome variable Y and some potential confounders, X1-X6
With inappropriate confounders removed from the model
An indicator of exposure, Z, an outcome variable Y and some potential confounders, X1-X6
An indicator of exposure, Z, an outcome variable Y and some potential confounders, X1-X6
W! and W2e are almost orthogonal to X. However their omission in a non-linear model leads to a huge bias in the estimate of the effect of X.
In this example the response is 1 if Y &gt; 1.5 and 0 otherwise. Other details are as before.