The folly of believing positive findings from underpowered intervention studies

Too Good to Be True: Health
Psychology’s Dependence on
Underpowered Positive Studies

James C. Coyne, Ph.D.
University of Groningen,
University Medical Center
Groningen, The Netherlands
Twitter @CoyneoftheRealm

Long a pervasive problem…
 Lack

of sufficient resources to conduct
well-designed, amply powered studies

 Confusion

about pilot studies: Cannot be
the basis for evaluating efficacy or
estimating effect sizes!

“We are grateful to the Society of Behavioral
Medicine (SBM) for selecting the authorship
group. This article is one of three metaanalyses that have been undertaken under the
aegis of the SBM Evidence-Based Behavioral
Medicine Committee; the other two metaanalyses examine the effects of psychosocial
interventions on depression and fatigue among
patients with cancer.”

SBM Initiative
Meta-analyses generated by professional
organizations should receive special
critical scrutiny because of tenancy to
gloss over limits of literature in order to
promote the services of their membership.

Small Studies
 Suffer

strong publication bias.

 Negative

findings go unpublished because
the studies are too small.

 Positive

findings celebrated because they
were obtained despite the smallness.

Small Studies
 Require

a larger effect size for statistical
significance.



Published results tend to be exaggerated
and not to be replicated in larger and better
quality later studies.

Small Trials Likely to Have Outliers,
and With Publication Bias, Yield
Results That Won’t Replicate
Hospital A has 10 births per month on average.
Hospital B has 100 births per month on
average.
In January, one of the hospitals reported 70%
of the births were girls. Is it more likely in A, B,
or equally likely to be in either?

Small Studies


Are particularly vulnerable to selective loss of
patients to follow-up and to investigators, outcome
raters knowing to which condition patients are
assigned.



Investigators can naïvely or deliberately monitor
incoming data and stop the trial when a positive
finding has been obtained, even when it is a chance
finding that would be undone with continued
accumulation of patients.

Sample Size


Sample size is the best proxy for other sources of
bias in trials.



Sample size negatively predicts overall effect size.



In presence of small study effects, restriction of
analyses to large trials or predictions of treatment
benefits observed in large trials might provide more
valid estimates than overall analyses of trials,
irrespective of sample size.

Gorin, et al "Metaanalysis of psychosocial
interventions to reduce
pain in patients with
cancer." Journal of
Clinical Oncology 30: (5):
(2012): 539-547.

Forest plot of effect sizes (g) for studies measuring pain severity (k = 38).

Sheinfeld Gorin S et al. JCO 2012;30:539-547

©2012 by American Society of Clinical Oncology

What the SBM Authors Claimed
about Psychosocial Interventions
for Cancer Pain


“Robust findings" of "substantial rigor" and “strong
evidence for psychosocial pain management
approaches."



Claimed findings supported the “systematic
implementation" of these techniques.



Estimated would take 812 unpublished studies
lurking in file drawers to change their assessment.

19 of 38 studies had less than 35 patients in
the intervention or control group. Two of the
other largest trials should have been
excluded for other reasons.
Of 13 studies individually having significant
effects on pain severity, 8 would have been
excluded because they were too small, 1
because it should not have been included in
the first place.

For 4 studies having the largest effect sizes,
1 had only 20 patients receiving relaxation;
the next largest had 10 patients who were
hypnotized; the next, 20 patients listening to
the relaxation tape versus 20 patients
getting live instructions, but these numbers
were obtained by replacing patients who
dropped out.
Study with the fourth largest effect size had
15 patients receiving training in selfhypnosis.

Some of the studies quite
small


7 patients receiving pain education

 10

patients receiving hypnosis

 16

patients getting pain education

 16

patients getting self hypnosis



8 patients getting relaxation plus 8
patients getting CBT plus relaxation

What is Left
Montgomery

0.9

0.61

2.67

Hypnosis

Lang

0.42

0.08

0.78

Hypnosis

Allard

0.32

-0.05

0.68

Nursing/Patient Ed

Rimer

0.31

-0.02

0.63

Nursing/Patient Ed

Yates

0.3

-0.03

0.63

Nursing/Patient Ed

DeWit

0.14

-0.08

0.36

Nursing/Patient Ed

DeWit Van Dam

-0.19

-0.61

0.24

Nursing/Patient Ed

Gaston-Johansson

-0.28

-0.65

0.46

Comprehensive Coping

Synthesis: pooling the results

Hart, et al. "Meta-analysis of
efficacy of interventions for
elevated depressive
symptoms in adults
diagnosed with cancer."
Journal of the National
Cancer Institute 104:13
(2012): 990-1004.

.

Hart S L et al. JNCI J Natl Cancer Inst 2012;104:990-1004
© The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions,
please e-mail: journals.permissions@oup.com.

3 studies classified as “psychotherapeutic”
were complex collaborative care
interventions for depression emphasizing
medication management.
These studies provided the bulk [527] of the
patients in the authors' calculation of the
effect size for psychotherapeutic
intervention.

Of the 2 remaining studies, 1 randomly
assigned 45 patients to either problemsolving or waitlist control and retained only
37 patients for analyses.
Final study contributed 2 effect sizes based
on comparisons of 29 patients receiving
CBT and 23 receiving supportive therapy to
the same 26-patient no-treatment control
group, thus violating the assumption of
independence of effect sizes.

With Removal of Small and
Inappropriately Classified Studies

No Eligible Studies Were Left

Fail-safe N of 106 confirms the relative
stability of the observed effect size.
“Our findings advance this literature
by demonstrating that psychological
and pharmacologic approaches,
evaluated in RCTs, can be targeted
productively toward cancer patients in
need of intervention by virtue of
clinical depression or elevated
depressive symptoms.”

Fail Safe N is Pseudo-Precise
Nonsense
Don’t Be Intimidated by Exaggerated
Estimates of Number of Unpublished
Studies Needed to Unseat Conclusions
Based on Meta Analysis of Underpowered
Studies.

Deficiencies of Failsafe N
 Combining

Z scores does not directly
account for sample sizes of the studies.

 Choice

of zero for the average effect of
the unpublished studies is arbitrary,
almost certainly biased.

 Allowing

for unpublished negative studies
substantially reduces failsafe N.

Deficiencies of Failsafe N
 Estimates

of failsafe N not influenced by
evidence of bias in the data.
 Guesswork to estimate the magnitude of
unpublished studies in the area.
 Heterogeneity among the studies is
ignored.
 Method is not influenced by the shape of
the funnel graph.

Are Small, Unpowered Studies
Good for Anything?
Leon, Andrew C., Lori L. Davis, and
Helena C. Kraemer. The role and
interpretation of pilot studies in clinical
research. Journal of Psychiatric Research
45:5 (2011): 626-629.

A pilot study is not a
hypothesis testing study.
Efficacy and effectiveness are
not evaluated in a pilot.

A pilot study does not provide a
meaningful effect size estimate for
planning subsequent studies due to
the imprecision inherent in data from
small samples.
Feasibility results do not necessarily
generalize beyond the inclusion and
exclusion criteria of the pilot design.
.

The folly of believing positive findings from underpowered intervention studies

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a The folly of believing positive findings from underpowered intervention studies

Semelhante a The folly of believing positive findings from underpowered intervention studies (20)

Mais de James Coyne

Mais de James Coyne (20)

Último

Último (20)

The folly of believing positive findings from underpowered intervention studies

Notas do Editor