Seventy years of RCTs

70 Years and Still Here
The Randomised Clinical Trial and its Critics
Stephen Senn
1(c)Stephen Senn 2018
stephen@senns.demon.co.uk
@stephensenn

Acknowledgements and a
Clarification
(c)Stephen Senn 2018 2
Acknowledgements
Many thanks to Ursula Garczarek, EUGM & Cytel for the invitation
The historical aspect of this work has benefitted greatly from reading articles by
Peter Armitage, John Gower and Nancy Hall
A Clarification
I am going to pick on a number of papers and take the authors to task regarding
some of the misunderstanding regarding randomisation that the promote
1. This does not mean that the papers as a whole are bad
2. Where they are wrong, it does not mean that they are uniquely wrong. I could
have picked on many other examples

Outline
Topic Number of Slides
Randomisation: from psychology to clinical
trials via agriculture
10
A game of two dice 8
The critics of randomisation 6
The critics answered 19
Conclusions 6

Randomisation: from
psychology to clinical trials via
agriculture
Strength from uncertainty

A timeline of randomisation & related
matters
When Who Where What
1843-1900 JB Laws & JH
Gilbert
Rothamsted 57 year scientific partnership developing agricultural
experiments
1883-1885 CS Peirce &
J Jastrow
Johns Hopkins First? use of randomisation. Experiment to determine
ability to detect small weight differences
1910 TH Wood &
FJM Stratton
Cambridge Applied to agriculture approach astronomers used to
determining precision of means
1925-1935 RA Fisher Rothamsted Proposed using randomisation in experiments
developed ANOVA
1933-1940 F Yates Rothamsted Developed theory of factorials, confounding,
recovering inter-block information
1948 Bradford Hill LSHTM, London MRC Streptomycin trial has random allocation
1965 John Nelder NVRS
Wellsbourne
Theory of general balance based on block and
treatment structure

Seventy years ago

Bradford Hill on randomisation
It ensures that neither our personal idiosyncrasies (our likes or
dislikes consciously or unwittingly applied) nor our lack of balanced
judgement has entered into the construction of the different
treatment groups—the allocation has been outside our control and
the groups are therefore unbiased; …
it removes the danger, inherent in an allocation based on personal
judgement, that believing we may be biased in our judgements we
endeavour to allow for that bias, to exclude it, and that in doing so
we may overcompensate and by thus ‘leaning over backward’
introduce a lack of balance from the other direction; …
and, having used a random allocation, the sternest critic is unable to
say when we eventually dash into print that quite probably the
groups were differentially biased through our predilections or
through our stupidity
Selected by
Armitage,
2003

The Rothamsted School
RA Fisher
1890-1962
Variance, ANOVA
Randomisation,
design,
significance tests
Frank Yates
1902-1994
Factorials, recovering
Inter-block information
John Nelder
1924-2010
General balance, computing
Genstat®
and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc

General Balance
History
• An idea of John Nelder’s
• Two papers in the Proceedings
of the Royal Society, 1965
concerning “The analysis of
randomized experiments with
orthogonal block structure”
• Block structure and the null
analysis of variance
• Treatment structure and the
general analysis of variance
Basic idea
• Splits an experiment into two
radically different components
• The block structure, which
describes the way that the
experimental units are organised
• The way that variation amongst
units can be described
• Null ANOVA – an idea of Anscombe’s
• The treatment structure, which
reflects the way that treatments
are combined for the scientific
purpose of the experiment

Design Driven Modelling
• Together with a third piece of information, the design matrix, these
determine the analysis of variance
• Note that because both block and treatments structure can be hierarchical
such a design matrix is not, on its own sufficient to derive an ANOVA
• But together with John’s block and treatment structure it is
• For designs exhibiting general balance
• This approach is incorporated in Genstat®

Genstat® Help File Example
Block Plot S N Yield
1 1 0 0 0.750
1 4 0 180 1.204
1 3 0 230 0.799
1 12 10 0 0.925
1 5 10 180 1.648
1 8 10 230 1.463
1 7 20 0 0.654
1 2 20 180 1.596
1 10 20 230 1.594
1 11 40 0 0.526
1 9 40 180 1.672
1 6 40 230 1.804
2 8 0 0 0.503
2 10 0 180 0.489
etc
" This is a field experiment
to study the effects of
nitrogen and sulphur on the
yield of wheat with a
randomized block design."
BLOCKSTRUCTURE Block / Plot
TREATMENTSTRUCTURE N * S
ANOVA [PRINT=aov; FPROBABILITY=yes]
Yield

Morals
• Design matters!
• Experimental material may have some structure
• Block structure may be complex
• More than one variance may be relevant
• The way that the design maps treatments onto the block
structure is important
• Determines the error term
• Randomisation of that which is declared irrelevant guarantees the
marginal probability statements
• These, in turn calibrate the conditional ones

A Game of Two Dice
The role of the roll

Game of Chance
• Two dice are rolled
– Red die
– Black die
• You have to call correctly the odds of a total score of 10
• Three variants
– Game 1 You call the odds and the dice are rolled together
– Game 2 the red die is rolled first, you are shown the score
and then must call the odds
– Game 3 the red die is rolled first, you are not shown the
score and then must call the odds

Total Score when Rolling Two Dice
Variant 1. Three of 36 equally likely results give a 10. The probability is 3/36=1/12.

Variant 2: If the red die score is 1,2 or 3, probability of a total of 10 is 0. If
the red die score is 4,5 or 6 the probability of a total of 10 is 1/6.
Variant 3: The probability = (½ x 0) + (½ x 1/6) = 1/12
Total Score when Rolling Two Dice

The Morals
• You can’t treat game 2 like game 1.
– You must condition on the information you receive in order to act
wisely
– You must use the actual data from the red die
• You can treat game 3 like game 1.
– You can use the distribution in probability that the red die has
• You can’t ignore an observed prognostic covariate in analysing
a clinical trial just because you randomised
– That would be to treat game 2 like game 1
• You can ignore an unobserved covariate precisely because you
did randomise
– Because you are entitled to treat game 3 like game 1

The Critics of
Randomisation
It ain’t what we don’t know that harms us but what we know
that ain’t so

Some criticisms
Who When What
Urbach 1985 You might just as well let the patients choose
Worral 2007 Infinitely many confounders: something must be
imbalanced
Borgerson 2009 “As Worral has shown”
Deaton and
Cartwright
2017 Type I error rate not maintained if data skewed
Krauss 2018 You should balance the patients at the start of the
trial

Some quotations

More quotations

Yet more quotations

To sum up the claims
• Trials aren’t perfectly balanced
• There are indefinitely many confounders
• We could do better by creating equal groups
• Blinding is important but randomisation isn’t
• We might as well let the patients choose which
(blinded) group they join

The Critics Answered
Being a statistician means never having to say you are certain

Points to understand
• Balance isn’t necessary
• Indefinitely many confounders argument is a red
herring
• It’s all about ratios
• Probability matters
• You can’t gather the patients together at the
beginning of a trial
• Effective blinding requires randomisation
• Allowing patients to choose is not a good idea

A Tale of Two Tables
Trial 1 Treatment
Sex Verum Placebo Total
Male 34 26 60
Female 15 25 40
Total 49 51 100
Trial 2 Treatment
Sex Verum Placebo
Male 26 26 52
Female 15 15 30
Total 41 41 82
• Trial two balanced
but trial one not
• Surely trial two
must be more
reliable
• Things are not so
simple

A Tale of Two Tables
Trial 1 Treatment
Sex Verum Placebo Total
Male 26+8 26 60
Female 15 15+10 40
Total 49 51 100
Trial 2 Treatment
Sex Verum Placebo
Male 26 26 52
Female 15 15 30
Total 41 41 82
• Trial two contains trial
one
• How can more
information be worse
than less
• If statistical theory could
not deal with Trial 1
there would be
something wrong with it

A Red Herring
• One sometimes hears that the fact that there are indefinitely
many covariates means that randomisation is useless
• This is quite wrong
• It is based on a misunderstanding that variant 3 of our game
should not be analysed like variant 1
• I showed you that it should
• Just because a series of terms is not finite does not mean that
their sum is not bounded
1 1 1
2 4 81 .... 2   

You are not free to imagine anything
at all
• Imagine that you are in
control of all the thousands
and thousands of covariates
that patients will have
• You are now going to allocate
the covariates and their
effects to patients
o As in a simulation
• If you respect the actual
variation in human health that
there can be you will find that
the net total effect of these
covariates is bounded
𝑌 = 𝛽0 + 𝑍 + 𝛽1 𝑋1 + ⋯ 𝛽 𝑘 𝑋 𝑘 + ⋯
Where Z is a treatment indicator and the
X are covariates. You are not free to
arbitrarily assume any values you like for
the Xs and the 𝛽𝑠 because the variance of
Y must be respected.

The importance of ratios
• In fact from one point of view there is only one covariate that
matters
o potential outcome
 If you know this, all other covariates are irrelevant
• And just as this can vary between groups in can vary within
• The t-statistic is based on the ratio of differences between to
variation within
• Randomisation guarantees (to a good approximation) the
unconditional behaviour of this ratio and that is all that
matters for what you can’t see (game 3)
• An example follows

Corollary – unobserved covariates can
be ignored if you have randomised
• The error is to assume that because you can’t use
randomisation as a justification for ignoring
information it is useless
• It is useful for what you don’t see
• Knowing that the two-dice game is fairly run is
important even though the average probability is not
relevant to game two
• Average probabilities are important for calibrating your
inferences
o Your conditional probabilities must be coherent with your
marginal ones
 See the relationship between the games

Hills andArmitageEneuresis Data
10
8
14
2
12
6 1210
6
4
2
0
40 8
Drynights placebo
Line of equality
Sequence Drug Placebo
Sequence placebo drug
Cross-over trial in
Eneuresis
Two treatment periods of
14 days each
1. Hills, M, Armitage, P. The two-period
cross-over clinical trial, British Journal of Clinical
Pharmacology 1979; 8: 7-20.

0.7
4
0.5
2
0.3
0
0.1
-2-4
0.6
0.2
0.4
0.0
Permutatedtreatment effect
Blue diamond shows
treatment effect whether or
not we condition on patient
as a factor.
It is identical because the
trial is balanced by patient.
However the permutation
distribution is quite different
and our inferences are
different whether we
condition (red) or not
(black) and clearly
balancing the randomisation
by patient and not
conditioning the analysis by
patient is wrong

The two permutation* distributions
summarised
Summary statistics for Permuted
difference no blocking
Number of observations = 10000
Mean = 0.00561
Median = 0.0345
Minimum = -3.828
Maximum = 3.621
Lower quartile = -0.655
Upper quartile = 0.655
P-value for observed difference 0.0340
*Strictly speaking randomisation
distributions
Summary statistics for Permuted
difference blocking
Number of observations = 10000
Mean = 0.00330
Median = 0.0345
Minimum = -2.379
Maximum = 2.517
Lower quartile = -0.517
Upper quartile = 0.517
P-value for observed difference 0.0014

Two Parametric Approaches
Not fitting patient effect
Estimate s.e. t(56) t pr.
2.172 0.964 2.25 0.0282
(P-value for permutation is 0.034)
Fitting patient effect
Estimate s.e. t(28) t pr
.
2.172 0.616 3.53 0.00147
(P-value for Permutation is 0.0014)

What happens if you balance but
don’t condition?
Approach Variance of estimated
treatment effect over all
randomisations*
Mean of variance of
estimated treatment
effect over all
randomisations*
Completely randomised
Analysed as such
0.987 0.996
Randomised within-
patient
Analysed as such
0.534 0.529
Randomised within-
patient Analysed as
completely randomised
0.534 1.005
*Based on 10000 random permutations
That is to say, permute values respecting the fact that they come from a cross-
over but analysing them as if they came from a parallel group trial

In terms of t-statistics
Approach Observed variance
of t-statistic over all
randomisations*
Predicted
theoretical variance
Completely
randomised
Analysed as such
1.027 1.037
Randomised within-
patient
Analysed as such
1.085 1.077
Randomised within-
patient Analysed as
completely
randomised
0.534 1.037@
*Based on 10000 random permutations
@ Using the common falsely assumed theory

The Shocking Truth
• The validity of conventional analysis of randomised
trials does not depend on covariate balance
• It is valid because they are not perfectly balanced
• If they were balanced the standard analysis would
be wrong

Randomisation is Necessary for Blinding
Fisher, in a letter to Jeffreys, explained the dangers of using a
haphazard method thus
… if I want to test the capacity of the human race for
telepathically perceiving a playing card, I might choose the
Queen of Diamonds, and get thousands of radio listeners to
send in guesses. I should then find that considerably more
than one in 52 guessed the card right... Experimentally this
sort of thing arises because we are in the habit of making
tacit hypotheses, e.g. ‘Good guesses are at random except for
a possible telepathic influence.’ But in reality it appears that
red cards are always guessed more frequently than
black(Bennett, 1990).(pp268-269)
…if the trial was, and remained, double-blind then
randomization could play no further role in this respect.
(Worrall, 2007)(P454)

Avoiding Double Guessing
• If you don’t randomise you have to assume
that your strategy has not been guessed by
the investigator
• You are using ‘the argument from the
stupidity of others’
• Not publishing the block size in your protocol
is a classic example

Deaton and Cartwright
Their claim
• Generate from a highly
skewed population
• Just have 50 patients per
arm
• Mean difference between
treatments in the
population is zero
• Type I error rate is 11% for a
nominal 5%
My simulation

One reason why Urbach’s proposal
is not a good idea
• Other things being
equal distributions
centred on 0.5 are
better
• The narrower the
distribution the better
• The narrowest
distribution centred on
0.5 is randomisation

Conclusion
The beginning of the end

My Philosophy of Clinical Trials
• Your (reasonable) beliefs dictate the model
• You should try measure what you think is important
• You should try fit what you have measured
– Caveat : random regressors and the Gauss-Markov theorem
• If you can balance what is important so much the better
– But fitting is more important than balancing
• Randomisation deals with unmeasured covariates
– You can use the distribution in probability of unmeasured covariates
– For measured covariates you must use the actual observed distribution
• Claiming to do ‘conservative inference’ is just a convenient
way of hiding bad practice
– Who thinks that analysing a matched pairs t as a two sample t is acceptable?

What’s out and What’s in
Out In
• Log-rank test
• T-test on change scores
• Chi-square tests on 2 x 2
tables
• Responder analysis and
dichotomies
• Balancing as an excuse for
not conditioning
• Proportional hazards
• Analysis of covariance
fitting baseline
• Logistic regression fitting
covariates
• Analysis of original values
• Modelling as a guide for
designs

Unresolved Issue
• In principle you should never be worse off by
having more information
• The ordinary least squares approach has two
potential losses in fitting covariates
– Loss of orthogonality
– Losses of degrees of freedom
• This means that eventually we lose by fitting
more covariates

Resolution?
• The Gauss-Markov theorem does not apply to
stochastic regressors
• In theory we can do better by having random effect
models
• However there are severe practical difficulties
• Possible Bayesian resolution in theory
• A pragmatic compromise of a limited number of
prognostic factors may be reasonable

To sum up
• There are a lot of people out there who fail to
understand what randomisation can and
cannot do for you
• We need to tell them firmly and clearly what
they need to understand
• The RCT may be 70 years old but it still looks
quite lively

Finally
I leave you with
this thought
Statisticians are always
tossing coins but do not
own many

References
Papers
1. Armitage P. Fisher, Bradford Hill,
and randomization. International
journal of epidemiology.
2003;32(6):925-928; discussion
945-928.
2. Gower J. Statistics and agriculture.
Journal of the Royal Statistical
Society. 1988;151(1):179-200.
3. Hall NS. RA Fisher and his advocacy
of randomization. Journal of the
History of Biology. 2007;40(2):295-
325.
4. Senn SJ. Seven myths of
randomisation in clinical trials.
Statistics in Medicine.
2013;32(9):1439-1450.
Blogs
3 blogs on Deborah Mayo’s
Error Statistics website. Email
stephen@senns.demon.co.uk
for details

Seventy years of RCTs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Seventy years of RCTs

Similar to Seventy years of RCTs (20)

More from Stephen Senn

More from Stephen Senn (6)

Recently uploaded

Recently uploaded (20)

Seventy years of RCTs