Conference presentation at ISCB 41 in the session
"Biostatistical inference in practice: moving beyond false
dichotomies"
A comment in Nature, signed by over 800 researchers, called for the scientific community to “retire statistical significance”. The responses included a call to halt the use of the term „statistically significant”, and changes in journal’s author guidelines. The leading discourse among statisticians is that inadequate statistical training of clinical researchers and publishing practices are to blame for the misuse of statistical testing. In this presentation, we search our collective conscience by reviewing ethical guidelines for statisticians in light of the p-value crisis, examine what this implies for us when conducting analyses in collaborative work and teaching, and whether the ATOM (accept uncertainty; be thoughtful, open and modest) principles can guide us.
Observational constraints on mergers creating magnetism in massive stars
Dichotomania and other challenges for the collaborating biostatistician
1. Dichotomania
and other challenges for the collaborating biostatistician
A perspective on principles, responsibilities and potential solutions
Laure Wynants PhD
laure.wynants@maastrichtuniversity.nl
@laure_wynants
2. Dichotomania
and other challenges for the collaborating biostatistician
A perspective on principles, responsibilities and potential solutions
Laure Wynants PhD
laure.wynants@maastrichtuniversity.nl
@laure_wynants
or teaching
5. Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008 Jul;45(3):135-40.
6. 6
Estimated probability of causal attribution according to the null P-value, modeled using fractional polynomials with a cutpoint at P = 0.05.
A Psychometric Experiment in Causal Inference to Estimate Evidential Weights Used by Epidemiologists
Holman, C. D’Arcy J.; Arnold-Reed, Diane E.; de Klerk, Nicholas; McComb, Christine; English, Dallas R., Epidemiology12(2):246-255, 2001.
https://xkcd.com/
7.
8. A conversation between a researcher and a statistician
• R: “We need some statistical testing for these plots.”
• S: “Why? These are not your main research questions
in this paper.”
• R: “I am not fishing for significant findings. I am aware
of the dangers. These are hypotheses we investigated
in earlier work. If the tests are significant, we know this
is confirmed in our new data.”
9. A conversation between a researcher and a statistician
• R: “We need some statistical testing for these plots.”
• S: “Why? These are not your main research questions
in this paper.”
• R: “I am not fishing for significant findings. I am aware
of the dangers. These are hypotheses we investigated
in earlier work. If the tests are significant, we know this
is confirmed in our new data.”
• R: “If it is not significant, we will discuss further. We
just didn’t have enough power then.”
10. Reporting and publication bias
“Trim and fill” funnel plot of Ki-67 expression for overall
survival in ovarian cancer patients (Qiu et al. Arch
Gynecol Obstet 2019)
Missing studies Published studies
11. Science as a disorderly mass of stray
observations, inconclusive results and
fledgling explanations.
And yet, as soon as their hypotheses were
turned into peer-reviewed papers,
researchers claimed that such facts had
always spoken for themselves.
13. Replication crisis
Ioannidis JAMA 2005: all original research published in 3 major journals in 1990-2003 and cited >1000 times:
49 studies, 45 claimed that the intervention was effective.
32% could not be replicated:
16% was contradicted (no effect found)
16% estimated effects were too strong (to the point that subsequent studies cast doubt on effect being clinically
important)
44% were replicated
11% no subsequent larger/better designed replication studies
Problems were worst for small RCTS and non-randomized studies.
Begly & Ellis Nature 2012: replication of landmark preclinical cancer studies: only 11% could be reproduced.
Journal impact factor Number of articles Mean number of citations of
non-reproduced articles
Mean number of citations of
reproduced articles
>20 21 248 (range 3–800) 231 (range 82–519)
5–19 32 169 (range 6–1,909) 13 (range 3–24)
14. • SARS-Cov-2 “viral loads in the very young do not differ significantly from
those of adults. Based on these results, we have to caution against an
unlimited re-opening of schools and kindergartens in the present situation”
• Ill-defined research question, comparison between all age groups (45
comparisons), test as if non-ordered categories.
• Reanalyzes with more appropriate techniques finds opposite conclusion.
• https://medium.com/@d_spiegel/is-sars-cov-2-viral-load-lower-in-young-children-than-adults-
8b4116d28353
15. A mistake in the operating room can
threaten the life of one patient;
A mistake in the statistical analysis or
interpretation can lead to hundreds of early
deaths.
Andrew Vickers, Biostatistician, Memorial Sloan Kettering Cancer Center
16. Some reactions to previous presentations
MDs
- surprise that meta-analysis can be biased
- “Our statisticians did not tell us this”
17. Altman BMJ 1994,
republished almost unchanged 20 years later…
“Put simply, much poor research arises
because researchers feel compelled for
career reasons to carry out research that
they are ill equipped to perform, and
nobody stops them.”
18. Statisticians
- “Oh no not this again”
- “We know this already”
- “P-values are not the problem”
- “It’s not us, it’s them”
19.
20.
21.
22.
23. An ethical statistician…
identifies and mitigates any preferences on the part of the investigators or data providers that might
predetermine or influence the analyses/results
only support studies that have pre-defined objectives and that are capable of producing useful results
strives to explain any expected adverse consequences of failure to follow through on an agreed-upon
sampling or analytic plan
shall indicate the risks and possible consequences if their professional judgement is overruled
Views or opinions based on general knowledge or belief should be clearly distinguished from views or
opinions derived from the statistical analyses being reported.
Taken from RSS code and ASA ethical guidelines
24. An ethical statistician…
recognizes […] research practices and standards can differ across disciplines, and statisticians do not have
obligations to standards of other professions that conflict with these guidelines
shall take personal responsibility for work bearing their name
avoids compromising scientific validity for expediency
should always be aware of their overriding responsibility to the public good […] A Fellow’s obligations to
employers, clients and the profession can never override this; and Fellows should seek to avoid situations
and not enter into undertakings which compromise this responsibility
Taken from RSS code and ASA ethical guidelines
25. An ethical statistician…
conveys the findings in ways that are both honest and meaningful to the user/reader
shall seek to conform to recognised good practice including quality standards which are in their judgement
relevant, and shall encourage others to do likewise
shall seek to advance knowledge and understanding of statistical science and advocate its use. This
advocacy of statistical science should extend to employers, clients, colleagues and the general public
Taken from RSS code and ASA ethical guidelines
27. What can we do better?
ATOM
- Accept uncertainty (no more ***, interpret confidence intervals)
- Be Thoughtful (research question, design, clinically relevant effect size, registered reports)
- Be Open (conflicts of interest, registration, share data, code, analysis protocols, publish all results)
- Be Modest (exploratory, retrospective, secondary analyses (no harking); interpret studies in broader context)
Ronald L. Wasserstein, Allen L. Schirm & Nicole A. Lazar (2019) Moving to a World Beyond “p < 0.05”, The American Statistician, 73:sup1, 1-19
28. What can we do better?
Distinguish between different applications
E.g. Cox 2020
- Two-decision situation (health screening: return tomorrow vs next year; control error)
- Subject-matter hypothesis (difference in trt, H0: there is no difference; p-value as measure of uncertainty)
- Dividing hypothesis (at which level does CI only contain positive/negative effects?)
- Tests of model adequacy(normality assumption, informal role, judgement required)
29. What can we do better?
• “How to” teaching vs understanding
Simulated data (Bishop 2020)
• Be explicit about how principles extend to observational
research
• Software
• Conceptual clarity in educational material (Greenland 2019)
“Significance level”: α or p-value?
“P-value”: observed value p or random variable P?
30. It won’t be easy
• Misconception fatigue in teaching
31. It won’t be easy
Wang et al Annals of Internal Medicine 2018
33. No statistician can do this alone
• A responsibility for each of us
• A role for professional organizations
• A necessity to put this on the agenda of ISCB
• Thanks to John Carlin and Jonathan Sterne - even if
there are no free croissants