SlideShare uma empresa Scribd logo
1 de 46
And thereby hangs a tail
The strange history of P-values
Stephen Senn
36th Fisher Memorial Lecture
(c) Stephen Senn 1
Acknowledgements
(c) Stephen Senn 2
My thanks to the Fisher Memorial Trust for inviting me to give the 36Th Fisher Memorial Lecture and to
the Royal Statistical Society for kindly agreeing to host it
Sandy Zabell’s 2008 paper on Student has been extremely useful as have various papers and comments
by John Aldrich and Stephen Stigler and David Howie’s book Interpreting Probability as well as Hald’s
and Stigler’s histories
I thank John Aldrich and Andy Grieve for helpful comments on an earlier version.
This work is partly supported by the European Union’s 7th Framework Programme for research,
technological development and demonstration under grant agreement no. 602552. “IDeAl”
An apology to all of you
(c) Stephen Senn 3
The abstract promised much more than I can deliver.
The complete history of P-values would start with Arbuthnot (1710) or
perhaps Daniel Bernoulli (1734) and continue to 2017
I shall limit myself (largely) to the first half of the 20th century (in fact,
really, to years 1908-1939) 31/307  10%
I shall just occasionally pretentiously sprinkle a few other names
I have various excuses but the best is this:
This lecture stands between you and a drink
An apology to Bayesians
(c) Stephen Senn 4
I am going to claim that part of the problem with the current debate about the suitability or otherwise of
P-values is to do with Bayesian statistics
This most emphatically does not mean that I think that the Bayesian form of inference is bad
My own thinking on statistical inference has been profoundly influenced (for the better!) by having had
interactions with prominent Bayesian statisticians
Nevertheless, I am going to claim that part of the perceived problem with P-values reflects an unresolved
struggle in Bayesian inference that has been going on for very nearly 100 years (since 1918)
I am now going to try and explain why
I shall start with the case against Fisher
Fisher the cause of inferential confusion?
David Colquhoun quoting
Robert Matthews
(c) Stephen Senn 5
Fisher the arch villain?
As was said above, reports of clinical experiments often culminate in a significance test (or a set
of tests, one for each variable observed) of the null hypothesis that the new treatment is
indistinguishable from the standard. To anyone whose sensibilities have not been blunted by
professional dedication to the science of statistics, such tests bring a pleasing touch of mystery and
ceremony to the proceedings. It is natural to suppose that well-established statistical theory
supports such tests. That is not so.
Significance tests of such null hypotheses at the end of an experiment can fairly be laid at R. A.
Fisher’s door, especially because of his insistence on them in The Design of Experiments.’
Francis Anscombe, 1990
(c) Stephen Senn 6
Fisher the false plagiarising prophet?
The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives
We want to persuade you of one claim: that William Sealy Gosset (1876-1937)—aka "Student" of the
Student's t-test—was right, and that his difficult friend, Ronald A. Fisher, though a genius, was wrong. Fit is
not the same thing as importance. Statistical significance is not the same thing as scientific importance. R2, t-
statistic, F-test, and all the more sophisticated versions of them in time series and the most advanced
statistics are misleading, at best.
Ziliak and McCloskey, 2006
(c) Stephen Senn 7
The false history
• To the extent that scientists were using formal statistical methods
prior to the 1920s they were what we would now call Bayesian
methods
• RA Fisher invented P-values as part of his rival system of frequentist
statistics
• These seemed to give significance much more easily
• They became an instant hit and seduced scientists away from the
path of Bayesian rectitude
• This is (largely) responsible for the replication crisis we now face
(c) Stephen Senn 8
However
• The history is not like that
• P-values may or may not be good statistics
• My own view is that they are one amongst many way of looking at data
• Setting the record straight will help us see what the problem is
• We are not close to a resolution
• Understanding what is necessary will help us do better statistics
• I am well aware that the problems we face will have to be solved by better
brains than mine
(c) Stephen Senn 9
(c) Stephen Senn 10
William Sealy Gosset
1876-1937
• Born Canterbury 1876
• Educated Winchester and Oxford
• First in mathematical moderations 1897
and first in degree in Chemistry 1899
• Started with Guinness in 1899 in Dublin
• Autumn 1906-spring 1907 with Karl
Pearson at UCL
• 1908 published ‘The probable error of a
mean’ in Biometrika
(c) Stephen Senn 11
Student, Biometrika
V1, March 1908, P2
(c) Stephen Senn 12
See The James Lind
Library
and
Cushny AR, Peebles AR
(1905). The action of
optical isomers. II.
Hyoscines. J Physiology
32:501-510.
(c) Stephen Senn 13
Student 1908
(c) Stephen Senn 14
Student, Biometrika, 1908, Vi, 1, p21
What Student did and did not do
What he did
• Obtained distribution of the sample standard
deviation
• Calculated using divisor n
• Showed it was uncorrelated with the sample mean
(and its square)
• Assuming a symmetric data distribution
• Obtained the distribution of the ratio of the mean
to sample standard deviation assuming
independence
• Tabulated this distribution
• Carried out various empirical investigations
• Applied it
• Interpreted the probabilities in a ‘Bayesian’ way
What he did not do
• Show that the sample mean and variance
were independent
• He just showed they were uncorrelated
• Generalise the problem beyond one
sample
• Define the t-statistic in its modern form
• Ratio of mean to standard error with SD
calculated using divisor n-1
• Use the modern significance test
interpretation
• Explicitly use Bayes theorem or any
derivation that we would now call
Bayesian
(c) Stephen Senn 15
The extent to which Student uses a Bayesian
input?
(c) Stephen Senn 16
Student, Biometrika V1, March 1908, P1
However, more explicit reference to prior distributions is provided in his
correlation coefficient paper published at the same time
Now compare Student and Fisher
(c) Stephen Senn 17
Fisher, Statistical Methods for Research Workers, 1925
An inverse (or what we
would now call
Bayesian) probability
statement
A direct
probability
statement
Looking at all three of Student’s analyses
(c) Stephen Senn 18
What Fisher did and did not do
What Fisher Did
• Reformulated the statistic so that
asymptotically it was Normal (0,1)
• Showed that it could be adapted to use for
many other problems
• Two sample t (with suitable assumptions)
• Regression coefficients
• Generalised it to three or more means
• Stressed an alternative interpretation (the
one we now use)
• Note, however, that these could also be found in
Karl Pearson’s work
• Suggested a doubling
• This last (controversial) step gave significance
less easily!
What Fisher did not do
• In this example he did not
calculate the P-value
• He merely noted ‘significance’ at a
conventional level (1%)
• This was computationally convenient
• He does calculate the P-value for
the sign test: ½9 =1/512
(c) Stephen Senn 19
Numerically Student and Fisher (would) agree
(c) Stephen Senn 20
0.0028=2 x 0.0014
Diversion
NHST: Null hypothesis significance testing
(c) Stephen Senn 21
• This is a monstrous hybrid philosophy formed by mixing Fisherian
significance tests (using P-values) and Neyman-Pearson hypothesis
tests using rejection/acceptance and Type I and II error rates
• People talk NP but do Fisherian
• This leads them into inferential error
Goodman, 1992, Statistics in
Medicine P 878
Not so
(c) Stephen Senn 22
P70 IMO Fisherian and NP
approaches do not differ (at least
for common standard cases) as
regards this
(c) Stephen Senn 23
The way Student
saw it
Following
Laplace (via
Airy?) you could
just invert the
probability
statement
The distribution
is centred on
the statistic and
is a statement
about the
probability of
the parameter
(c) Stephen Senn 24
This is how Fisher saw it
(c) Stephen Senn 25
The two interpretations
described
The distributions are
different but one is a
translation of the other
In fact you can reflect
one about 2.03 (the
average of the mean,
4.06, of one distribution
and the mean, 0, of the
other) to get the other
distribution
Different tail areas are
numerically identical but
have different
interpretations
(c) Stephen Senn 26
A magnification
of the previous
diagram
Bayesian: the
probability that the
treatment that
appears to be better
is worse after all
Frequentist: the
probability of a result
as extreme or more
extreme if the null
hypothesis is true
NB Andy Grieve
has made the
point to me that
Bayesians would
more naturally
use 1-P
(c) Stephen Senn 27
Two independent
observations,
X1 and X2, from a
Normal distribution
with mean 0
10% level of
significance
(two-sided).
Red circles
significant
Black x non-
significant
Contours give
probability
densities for
circular Normal
100 points have
been simulated
9 simulated
values are
‘significant’
(c) Stephen Senn 28
Two independent
observations,
X1 and X2, from a
Normal distribution
with mean 2
10% level of
significance
(two-sided).
Red circles
significant
Black x non-
significant
Contours give
probability
densities for
circular Normal
100 points have
been simulated
88 simulated
values are
‘significant’
(c) Stephen Senn 29
Two independent
observations,
X1 and X2, from a
Normal distribution
with mean 0
10% level of
significance
(two-sided).
Red circles
significant
Black x non-
significant
Contours give
probability
densities for
circular Normal
100 points have
been simulated
8 simulated
values are
‘significant’
(c) Stephen Senn 30
Two independent
observations,
X1 and X2, from a
Normal distribution
with mean 2
10% level of
significance
(two-sided).
Red circles
significant
Black x non-
significant
Contours give
probability
densities for
circular Normal
100 points have
been simulated
35 simulated
values are
‘significant’
To sum up
• Fisher and Student did not disagree as regards probabilities numerically
• At least not in any way that casts Fisher as more liberal
• Two tailed controversy
• They differed as regard interpretation of the probability
• Fisher saw that any Bayesian interpretation depended on prior
assumptions
• Student simply used a standard default argument
• At least if the evidence of his 1908 paper is anything to go by
• Student’s paper was only eventually influential thanks to Fisher
• Speculation: Until Fisher’s work made an impact, estimation continued to
be largely ‘Bayesian’ but ignoring nuisance parameter uncertainty
(c) Stephen Senn 31
So who did produce a formal Bayesian
derivation of the t-distribution?
(c) Stephen Senn 32
Take your pick from
Dedekind 1860 (nearly)
Luroth, 1874 (but only considered 50% probability but is otherwise more general)
Edgeworth, 1883
Burnside, 1923
Jeffreys, 1931 (and also in his book of 1939)
We shall now consider the story with Jeffreys, for although Jeffreys produced a result
for the t-distribution that is essentially the Luroth/Student/Fisher result he also did
something radically different. However to get to Jeffreys we need to consider Laplace
(briefly) and then Broad
Some pretentious sprinkling of names (as promised)
(c) Stephen Senn 33
Laplace (1774)
De Morgan (1838)
Venn (1888), pp196-197
CD Broad 1887*-1971
• Graduated Cambridge 1910
• Fellow of Trinity 1911
• Lectured at St Andrews & Bristol
• Returned to Cambridge 1926
• Knightbridge Professor of Philosophy
1933-1953
• Interested in epistemology and
psychic research
*NB Harold Jeffreys born 1891 & Fisher
1890
(c) Stephen Senn 34
CD Broad, 1918
(c) Stephen Senn 35
P393
p394
As m goes to
infinity the first
approaches 1
If n much greater
than m the latter is
small
What Jeffreys Understood
(c) Stephen Senn 36
Theory of Probability, 3rd edition P128
The Economist gets it wrong
(c) Stephen Senn 37
The canonical example is to imagine that a precocious newborn observes
his first sunset, and wonders whether the sun will rise again or not. He
assigns equal prior probabilities to both possible outcomes, and
represents this by placing one white and one black marble into a bag. The
following day, when the sun rises, the child places another white marble
in the bag. The probability that a marble plucked randomly from the bag
will be white (ie, the child’s degree of belief in future sunrises) has thus
gone from a half to two-thirds. After sunrise the next day, the child adds
another white marble, and the probability (and thus the degree of belief)
goes from two-thirds to three-quarters. And so on. Gradually, the initial
belief that the sun is just as likely as not to rise each morning is modified
to become a near-certainty that the sun will always rise.
What Jeffreys (and Wrinch) concluded
• If you have an uninformative prior distribution the probability of a
precise hypothesis is very low
• It will remain low even if you have lots of data consistent with it
• It will never become plausible
• You need to allocate a solid lump of probability that it is true
• Nature has decided, other things being equal, that simpler
hypotheses are more likely
(c) Stephen Senn 38
Dorothy
Wrinch
(c) Stephen Senn 39
When you switch from testing
H0:   0 (dividing hypothesis)
to
H0:  = 0 (plausible hypothesis)
It makes rather little difference
to the performance of a
frequentist test.
This may or may not be a good
thing
In the Bayesian case it
makes a world of
difference
(the terminology is due to
David Cox, 1977)
Why the difference?
• Imagine a point estimate of two
standard errors (large sample)
• Now consider the likelihood
ratio for a given value of the
parameter,  under the
alternative to one under the null
• Dividing hypothesis (smooth prior)
for any given value  =  compare
to  = -
• Plausible hypothesis (lump prior)
for any given value  =  compare
to  = 0
(c) Stephen Senn 40
H0
H1
Why the difference?
• Imagine a point estimate of two
standard errors (large sample)
• Now consider the likelihood
ratio for a given value of the
parameter,  under the
alternative to one under the null
• Dividing hypothesis (smooth prior)
for any given value  =  compare
to  = -
• Plausible hypothesis (lump prior)
for any given value  =  compare
to  = 0
(c) Stephen Senn 41
H1
H1H0
The real history
• Scientists before Fisher were using tail area probabilities to calculate
posterior probabilities
• This was following Laplace’s use of uninformative prior distributions
• Fisher pointed out that this interpretation was unsafe and offered a more
conservative one
• Jeffreys, influenced by CD Broad’s criticism, was unsatisfied with the
Laplacian framework and used a lump prior probability on a point
hypothesis being true
• Etz and Wagenmakers have claimed that Haldane 1932 anticipated Jeffreys
• It is Bayesian Jeffreys versus Bayesian Laplace that makes the dramatic
difference, not frequentist Fisher versus Bayesian Laplace
(c) Stephen Senn 42
In summary
• The major disagreement is not between P-values and Bayes using
informative prior distribution
• It’s between two Bayesian approaches
• Using uninformative prior distributions
• Using a highly informative one
• The conflict is not going to go away by banning P-values
• There is no automatic Bayesianism
• You have to do it for real
(c) Stephen Senn 43
My (tentative) opinion
• The fundamental conflict will not disappear by banning P-values nor by
modifying them nor by re-calibrating them
• There may be a harmful culture of ‘significance’ however this is defined
• P-values have a (very) limited use as rough and ready tools using little
structure
• Where you have more structure you can often do better
• Likelihood (Fisher)
• Confidence distributions
• Severity (Deborah Mayo)
• Point estimates and standard errors
• extremely useful for future research synthesizers and should be provided regularly
(c) Stephen Senn 44
And also, of course, Bayes!
Good
• For ‘personal’ decision-making
• Ramsey, De Finetti, Savage, Lindley
• Involves elicitation problems: O’Hagan
• In pragmatic compromises
• Good
• Box (1980)
• Racine, Grieve, Fluehler, Smith (1986)
• As an aid to thinking
• The reverse Bayes of Robert Matthews
• The conditional Bayes approach of
Spiegelhalter, Freedman & Parmar
JRSSA, 1994 BART
No so Good?
• Bayesian significance tests
• Bayes-factors
• P-values modified to behave like
Bayesian tests
• Or Bayesian approaches
modified just to make them
behave like P-values
(c) Stephen Senn 45
Speaking of BART
(c) Stephen Senn 46
This lecture no longer stands between you and a drink

Mais conteúdo relacionado

Mais procurados

What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
Stephen Senn
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
Stephen Senn
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
Stephen Senn
 

Mais procurados (20)

The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA Fisher
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
 
Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisation
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to mislead
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Personalised medicine a sceptical view
Personalised medicine a sceptical viewPersonalised medicine a sceptical view
Personalised medicine a sceptical view
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
 
Seven myths of randomisation
Seven myths of randomisation Seven myths of randomisation
Seven myths of randomisation
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
 
What is your question
What is your questionWhat is your question
What is your question
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differences
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTs
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
 

Semelhante a And thereby hangs a tail

D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Wars
jemille6
 

Semelhante a And thereby hangs a tail (20)

Senn repligate
Senn repligateSenn repligate
Senn repligate
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
 
The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualties
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Logical issues in Social Scientific Approach of Communication Research
Logical issues in Social Scientific Approach of Communication ResearchLogical issues in Social Scientific Approach of Communication Research
Logical issues in Social Scientific Approach of Communication Research
 
D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500
 
Mayod@psa 21(na)
Mayod@psa 21(na)Mayod@psa 21(na)
Mayod@psa 21(na)
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Wars
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
 
Statistical theory.3.18.15
Statistical theory.3.18.15Statistical theory.3.18.15
Statistical theory.3.18.15
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper ConceptsExcursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
 
tjosullivanThesis
tjosullivanThesistjosullivanThesis
tjosullivanThesis
 
Hypothesis in rm
Hypothesis in rmHypothesis in rm
Hypothesis in rm
 
Models of science
Models of scienceModels of science
Models of science
 

Mais de Stephen Senn

What is your question
What is your questionWhat is your question
What is your question
Stephen Senn
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
Stephen Senn
 

Mais de Stephen Senn (10)

Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
 
What is your question
What is your questionWhat is your question
What is your question
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 
The story of MTA/02
The story of MTA/02The story of MTA/02
The story of MTA/02
 
Confounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricksConfounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricks
 
Real world modified
Real world modifiedReal world modified
Real world modified
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Último (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

And thereby hangs a tail

  • 1. And thereby hangs a tail The strange history of P-values Stephen Senn 36th Fisher Memorial Lecture (c) Stephen Senn 1
  • 2. Acknowledgements (c) Stephen Senn 2 My thanks to the Fisher Memorial Trust for inviting me to give the 36Th Fisher Memorial Lecture and to the Royal Statistical Society for kindly agreeing to host it Sandy Zabell’s 2008 paper on Student has been extremely useful as have various papers and comments by John Aldrich and Stephen Stigler and David Howie’s book Interpreting Probability as well as Hald’s and Stigler’s histories I thank John Aldrich and Andy Grieve for helpful comments on an earlier version. This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDeAl”
  • 3. An apology to all of you (c) Stephen Senn 3 The abstract promised much more than I can deliver. The complete history of P-values would start with Arbuthnot (1710) or perhaps Daniel Bernoulli (1734) and continue to 2017 I shall limit myself (largely) to the first half of the 20th century (in fact, really, to years 1908-1939) 31/307  10% I shall just occasionally pretentiously sprinkle a few other names I have various excuses but the best is this: This lecture stands between you and a drink
  • 4. An apology to Bayesians (c) Stephen Senn 4 I am going to claim that part of the problem with the current debate about the suitability or otherwise of P-values is to do with Bayesian statistics This most emphatically does not mean that I think that the Bayesian form of inference is bad My own thinking on statistical inference has been profoundly influenced (for the better!) by having had interactions with prominent Bayesian statisticians Nevertheless, I am going to claim that part of the perceived problem with P-values reflects an unresolved struggle in Bayesian inference that has been going on for very nearly 100 years (since 1918) I am now going to try and explain why I shall start with the case against Fisher
  • 5. Fisher the cause of inferential confusion? David Colquhoun quoting Robert Matthews (c) Stephen Senn 5
  • 6. Fisher the arch villain? As was said above, reports of clinical experiments often culminate in a significance test (or a set of tests, one for each variable observed) of the null hypothesis that the new treatment is indistinguishable from the standard. To anyone whose sensibilities have not been blunted by professional dedication to the science of statistics, such tests bring a pleasing touch of mystery and ceremony to the proceedings. It is natural to suppose that well-established statistical theory supports such tests. That is not so. Significance tests of such null hypotheses at the end of an experiment can fairly be laid at R. A. Fisher’s door, especially because of his insistence on them in The Design of Experiments.’ Francis Anscombe, 1990 (c) Stephen Senn 6
  • 7. Fisher the false plagiarising prophet? The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives We want to persuade you of one claim: that William Sealy Gosset (1876-1937)—aka "Student" of the Student's t-test—was right, and that his difficult friend, Ronald A. Fisher, though a genius, was wrong. Fit is not the same thing as importance. Statistical significance is not the same thing as scientific importance. R2, t- statistic, F-test, and all the more sophisticated versions of them in time series and the most advanced statistics are misleading, at best. Ziliak and McCloskey, 2006 (c) Stephen Senn 7
  • 8. The false history • To the extent that scientists were using formal statistical methods prior to the 1920s they were what we would now call Bayesian methods • RA Fisher invented P-values as part of his rival system of frequentist statistics • These seemed to give significance much more easily • They became an instant hit and seduced scientists away from the path of Bayesian rectitude • This is (largely) responsible for the replication crisis we now face (c) Stephen Senn 8
  • 9. However • The history is not like that • P-values may or may not be good statistics • My own view is that they are one amongst many way of looking at data • Setting the record straight will help us see what the problem is • We are not close to a resolution • Understanding what is necessary will help us do better statistics • I am well aware that the problems we face will have to be solved by better brains than mine (c) Stephen Senn 9
  • 10. (c) Stephen Senn 10 William Sealy Gosset 1876-1937 • Born Canterbury 1876 • Educated Winchester and Oxford • First in mathematical moderations 1897 and first in degree in Chemistry 1899 • Started with Guinness in 1899 in Dublin • Autumn 1906-spring 1907 with Karl Pearson at UCL • 1908 published ‘The probable error of a mean’ in Biometrika
  • 11. (c) Stephen Senn 11 Student, Biometrika V1, March 1908, P2
  • 12. (c) Stephen Senn 12 See The James Lind Library and Cushny AR, Peebles AR (1905). The action of optical isomers. II. Hyoscines. J Physiology 32:501-510.
  • 13. (c) Stephen Senn 13 Student 1908
  • 14. (c) Stephen Senn 14 Student, Biometrika, 1908, Vi, 1, p21
  • 15. What Student did and did not do What he did • Obtained distribution of the sample standard deviation • Calculated using divisor n • Showed it was uncorrelated with the sample mean (and its square) • Assuming a symmetric data distribution • Obtained the distribution of the ratio of the mean to sample standard deviation assuming independence • Tabulated this distribution • Carried out various empirical investigations • Applied it • Interpreted the probabilities in a ‘Bayesian’ way What he did not do • Show that the sample mean and variance were independent • He just showed they were uncorrelated • Generalise the problem beyond one sample • Define the t-statistic in its modern form • Ratio of mean to standard error with SD calculated using divisor n-1 • Use the modern significance test interpretation • Explicitly use Bayes theorem or any derivation that we would now call Bayesian (c) Stephen Senn 15
  • 16. The extent to which Student uses a Bayesian input? (c) Stephen Senn 16 Student, Biometrika V1, March 1908, P1 However, more explicit reference to prior distributions is provided in his correlation coefficient paper published at the same time
  • 17. Now compare Student and Fisher (c) Stephen Senn 17 Fisher, Statistical Methods for Research Workers, 1925 An inverse (or what we would now call Bayesian) probability statement A direct probability statement
  • 18. Looking at all three of Student’s analyses (c) Stephen Senn 18
  • 19. What Fisher did and did not do What Fisher Did • Reformulated the statistic so that asymptotically it was Normal (0,1) • Showed that it could be adapted to use for many other problems • Two sample t (with suitable assumptions) • Regression coefficients • Generalised it to three or more means • Stressed an alternative interpretation (the one we now use) • Note, however, that these could also be found in Karl Pearson’s work • Suggested a doubling • This last (controversial) step gave significance less easily! What Fisher did not do • In this example he did not calculate the P-value • He merely noted ‘significance’ at a conventional level (1%) • This was computationally convenient • He does calculate the P-value for the sign test: ½9 =1/512 (c) Stephen Senn 19
  • 20. Numerically Student and Fisher (would) agree (c) Stephen Senn 20 0.0028=2 x 0.0014
  • 21. Diversion NHST: Null hypothesis significance testing (c) Stephen Senn 21 • This is a monstrous hybrid philosophy formed by mixing Fisherian significance tests (using P-values) and Neyman-Pearson hypothesis tests using rejection/acceptance and Type I and II error rates • People talk NP but do Fisherian • This leads them into inferential error Goodman, 1992, Statistics in Medicine P 878
  • 22. Not so (c) Stephen Senn 22 P70 IMO Fisherian and NP approaches do not differ (at least for common standard cases) as regards this
  • 23. (c) Stephen Senn 23 The way Student saw it Following Laplace (via Airy?) you could just invert the probability statement The distribution is centred on the statistic and is a statement about the probability of the parameter
  • 24. (c) Stephen Senn 24 This is how Fisher saw it
  • 25. (c) Stephen Senn 25 The two interpretations described The distributions are different but one is a translation of the other In fact you can reflect one about 2.03 (the average of the mean, 4.06, of one distribution and the mean, 0, of the other) to get the other distribution Different tail areas are numerically identical but have different interpretations
  • 26. (c) Stephen Senn 26 A magnification of the previous diagram Bayesian: the probability that the treatment that appears to be better is worse after all Frequentist: the probability of a result as extreme or more extreme if the null hypothesis is true NB Andy Grieve has made the point to me that Bayesians would more naturally use 1-P
  • 27. (c) Stephen Senn 27 Two independent observations, X1 and X2, from a Normal distribution with mean 0 10% level of significance (two-sided). Red circles significant Black x non- significant Contours give probability densities for circular Normal 100 points have been simulated 9 simulated values are ‘significant’
  • 28. (c) Stephen Senn 28 Two independent observations, X1 and X2, from a Normal distribution with mean 2 10% level of significance (two-sided). Red circles significant Black x non- significant Contours give probability densities for circular Normal 100 points have been simulated 88 simulated values are ‘significant’
  • 29. (c) Stephen Senn 29 Two independent observations, X1 and X2, from a Normal distribution with mean 0 10% level of significance (two-sided). Red circles significant Black x non- significant Contours give probability densities for circular Normal 100 points have been simulated 8 simulated values are ‘significant’
  • 30. (c) Stephen Senn 30 Two independent observations, X1 and X2, from a Normal distribution with mean 2 10% level of significance (two-sided). Red circles significant Black x non- significant Contours give probability densities for circular Normal 100 points have been simulated 35 simulated values are ‘significant’
  • 31. To sum up • Fisher and Student did not disagree as regards probabilities numerically • At least not in any way that casts Fisher as more liberal • Two tailed controversy • They differed as regard interpretation of the probability • Fisher saw that any Bayesian interpretation depended on prior assumptions • Student simply used a standard default argument • At least if the evidence of his 1908 paper is anything to go by • Student’s paper was only eventually influential thanks to Fisher • Speculation: Until Fisher’s work made an impact, estimation continued to be largely ‘Bayesian’ but ignoring nuisance parameter uncertainty (c) Stephen Senn 31
  • 32. So who did produce a formal Bayesian derivation of the t-distribution? (c) Stephen Senn 32 Take your pick from Dedekind 1860 (nearly) Luroth, 1874 (but only considered 50% probability but is otherwise more general) Edgeworth, 1883 Burnside, 1923 Jeffreys, 1931 (and also in his book of 1939) We shall now consider the story with Jeffreys, for although Jeffreys produced a result for the t-distribution that is essentially the Luroth/Student/Fisher result he also did something radically different. However to get to Jeffreys we need to consider Laplace (briefly) and then Broad
  • 33. Some pretentious sprinkling of names (as promised) (c) Stephen Senn 33 Laplace (1774) De Morgan (1838) Venn (1888), pp196-197
  • 34. CD Broad 1887*-1971 • Graduated Cambridge 1910 • Fellow of Trinity 1911 • Lectured at St Andrews & Bristol • Returned to Cambridge 1926 • Knightbridge Professor of Philosophy 1933-1953 • Interested in epistemology and psychic research *NB Harold Jeffreys born 1891 & Fisher 1890 (c) Stephen Senn 34
  • 35. CD Broad, 1918 (c) Stephen Senn 35 P393 p394 As m goes to infinity the first approaches 1 If n much greater than m the latter is small
  • 36. What Jeffreys Understood (c) Stephen Senn 36 Theory of Probability, 3rd edition P128
  • 37. The Economist gets it wrong (c) Stephen Senn 37 The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.
  • 38. What Jeffreys (and Wrinch) concluded • If you have an uninformative prior distribution the probability of a precise hypothesis is very low • It will remain low even if you have lots of data consistent with it • It will never become plausible • You need to allocate a solid lump of probability that it is true • Nature has decided, other things being equal, that simpler hypotheses are more likely (c) Stephen Senn 38 Dorothy Wrinch
  • 39. (c) Stephen Senn 39 When you switch from testing H0:   0 (dividing hypothesis) to H0:  = 0 (plausible hypothesis) It makes rather little difference to the performance of a frequentist test. This may or may not be a good thing In the Bayesian case it makes a world of difference (the terminology is due to David Cox, 1977)
  • 40. Why the difference? • Imagine a point estimate of two standard errors (large sample) • Now consider the likelihood ratio for a given value of the parameter,  under the alternative to one under the null • Dividing hypothesis (smooth prior) for any given value  =  compare to  = - • Plausible hypothesis (lump prior) for any given value  =  compare to  = 0 (c) Stephen Senn 40 H0 H1
  • 41. Why the difference? • Imagine a point estimate of two standard errors (large sample) • Now consider the likelihood ratio for a given value of the parameter,  under the alternative to one under the null • Dividing hypothesis (smooth prior) for any given value  =  compare to  = - • Plausible hypothesis (lump prior) for any given value  =  compare to  = 0 (c) Stephen Senn 41 H1 H1H0
  • 42. The real history • Scientists before Fisher were using tail area probabilities to calculate posterior probabilities • This was following Laplace’s use of uninformative prior distributions • Fisher pointed out that this interpretation was unsafe and offered a more conservative one • Jeffreys, influenced by CD Broad’s criticism, was unsatisfied with the Laplacian framework and used a lump prior probability on a point hypothesis being true • Etz and Wagenmakers have claimed that Haldane 1932 anticipated Jeffreys • It is Bayesian Jeffreys versus Bayesian Laplace that makes the dramatic difference, not frequentist Fisher versus Bayesian Laplace (c) Stephen Senn 42
  • 43. In summary • The major disagreement is not between P-values and Bayes using informative prior distribution • It’s between two Bayesian approaches • Using uninformative prior distributions • Using a highly informative one • The conflict is not going to go away by banning P-values • There is no automatic Bayesianism • You have to do it for real (c) Stephen Senn 43
  • 44. My (tentative) opinion • The fundamental conflict will not disappear by banning P-values nor by modifying them nor by re-calibrating them • There may be a harmful culture of ‘significance’ however this is defined • P-values have a (very) limited use as rough and ready tools using little structure • Where you have more structure you can often do better • Likelihood (Fisher) • Confidence distributions • Severity (Deborah Mayo) • Point estimates and standard errors • extremely useful for future research synthesizers and should be provided regularly (c) Stephen Senn 44
  • 45. And also, of course, Bayes! Good • For ‘personal’ decision-making • Ramsey, De Finetti, Savage, Lindley • Involves elicitation problems: O’Hagan • In pragmatic compromises • Good • Box (1980) • Racine, Grieve, Fluehler, Smith (1986) • As an aid to thinking • The reverse Bayes of Robert Matthews • The conditional Bayes approach of Spiegelhalter, Freedman & Parmar JRSSA, 1994 BART No so Good? • Bayesian significance tests • Bayes-factors • P-values modified to behave like Bayesian tests • Or Bayesian approaches modified just to make them behave like P-values (c) Stephen Senn 45
  • 46. Speaking of BART (c) Stephen Senn 46 This lecture no longer stands between you and a drink