1. Should we stop looking for
genetic biomarkers for
psychological traits and
disorders?
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee
2. Biomarker
• ˈbʌɪəʊmɑːkə/
• noun
• noun: biomarker; plural noun:
biomarkers
• a naturally occurring molecule,
gene, or characteristic by which
a particular pathological or
physiological process, disease,
etc. can be identified.
3. Data from online website NIH Reporter and online Congressional Research Service report
Between
2009-2017
over 3 fold
increase in
funding.
Cf. all NIH
funding;
$30,545m
in 2009
$34,301m
in 2017
(1.12 fold
increase)
0.47%
NIH
budget
1.64%
NIH
budget
2009, the promise: ‘Advances in genomics are
revolutionizing medicine with discoveries that help
elucidate mechanisms and design novel treatments’
4. ‘Early findings revealed a more complex genetic
architecture than was anticipated for most
common diseases — complexity that seemed to
limit the immediate utility of these findings. As a
result, the practice of utilizing the DNA of an
individual to predict disease has been judged to
provide little to no useful information.’
Torkamani, A., Wineinger, N. E., & Topol, E. J. (2018). The personal and clinical
utility of polygenic risk scores. Nature Reviews Genetics, 19, 581-590.
A recent expert review: but is this too gloomy?
5. Important distinction between
Rare diseases:
Point mutations and Copy Number Variants
Common conditions:
Allelic variants common in general population
6. Classic genetics taught using clearcut Mendelian examples
• Huntington’s disease – only occurs in those with rare mutation in gene HTT
• FOXP2 mutation: associated with severe speech-language disorder
• Good progress in finding rare genetic variants that cause severe, rare
disorders, using arrays (to detect deletion or duplication of segments of
DNA) and exome sequencing (to detect harmful changes to DNA sequence).
But each accounts for very small % of cases.
7. Example of progress: Deciphering Developmental Disorders study
https://www.ddduk.org/
8. Complex multifactorial disorders
Aggregate but do not segregate in families
– i.e. run in families but you can’t trace effect of
gene through the generations according to
simple Mendelian rules
Thought to be the most usual type of etiology
for common disorders
e.g. Heart disease, diabetes, asthma, allergies
Also dyslexia, ADHD, Developmental Language
Disorder
Common disease – common variant model
Idea that there is continuous distribution of
liability for disorder, represents summed effect
of numerous very small genetic and
environmental effects
8
9. 9
Association analysis: Compare how different genotypes differ
in terms of a quantitative trait or a categorical diagnosis
• Early studies looked for impact of genes on phenotypes
(common disorders or normal variation)
• Very different from FOXP2: ‘Risk’ alleles common in
general population and have small effect size
10. 10
How association analysis works: Compare different genotypes
on a quantitative or categorical trait. Example
N = 219
N = 175
N = 55
Effect size can
be estimated
using
correlation
between N
risk alleles
and
phenotype.
Here r ≈ .15
2001
11. This field affected by 3 factors that generate false
positive errors
• Publication bias
• P-hacking
• HARKing
12. This field affected by 3 factors that generate false
positive errors
• Publication bias
• P-hacking
• HARKing
But also affected by 3 factors that mean we
may miss real effects
• Low power
• Unreliable measures
• Unrepresentative samples
13. Mainstream genetics is aware of many of these issues
and has taken steps to tackle them
But ease of obtaining and analysing DNA (spit pots,
commercial companies) has led to surge of
non-geneticists treating genetic variants as
independent variables, without understanding the
methodological problems
14. High profile study reports
association; published in ‘high
impact’ journal Studies with null
results not published
Publication
More dependent on results
than quality of methods
Problem: File Drawer effect or
‘Prejudice against the null’
“As it is functioning in at least some areas
of behavioral science research, the
research-publication system may be
regarded as a device for systematically
generating and propagating anecdotal
information.”
Greenwald, 1975
Generally blamed on
journal publication
policies, though
researchers also don’t
bother with negative
findings.
Lure of exciting
findings; failure to
value negative
evidence
Publication bias
15. P-hacking and HARKing
1. P-hacking and HARKing
• Usually many possible ways of analyzing data
• Have to decide:
• Which SNPs (genetic variants) to look at
• Which measures of phenotype
• Which genetic model (dominant, additive, recessive)
• Whether to divide into subgroups
Typically researchers report as if decisions made a priori,
but no way to know if that is true
If many independent decisions made, the N of possible
analyses multiplies – i.e. not just additive
16. 1 contrast
Probability of a
‘significant’ p-value
< .05 = .05
Study exploring link
between COMT and
frontal lobe function in
schizophrenia
https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379
Schizophrenia + control
Val/Val, Val/Met, Met/Met
17. Pick specific measure of
phenotype:
• 2 contrasts at this level
• (NB in reality in this field,
typically many more potential
measures)
• Probability of a ‘significant’
p-value < .05 = .10
Study exploring link
between COMT and
frontal lobe function
Schizophrenia + control
Val/Val, Val/Met, Met/Met
WCST
categories
WCST
persev.
errors
18. Divide sample into those with
and without paranoia
Probability of a
‘significant’ p-value < .05
= .19
Study exploring link
between COMT and
frontal lobe function
Schizophrenia + control
Val/Val, Val/Met, Met/Met
WCST
categories
WCST
persev.
errors
19. Focus just on Females
Probability of a
‘significant’ p-value < .05
= .34
Study exploring link
between COMT and
frontal lobe function
Schizophrenia + control
Val/Val, Val/Met, Met/Met
WCST
categories
WCST
persev.
errors
20. Use model of recessive effect
of SNP (R), rather than additive
(U)
Probability of a
‘significant’ p-value < .05
= .56
Study exploring link
between COMT and
frontal lobe function
Schizophrenia
Val/Val, Val/Met, Met/Met
WCST
categories
WCST
persev.
errors
21. The biomarker discovery cycle
Grey denotes unpublished studies
Impression of large
body of
confirmatory work
But few studies
actually replicate
findings
All inconsistent
findings have been
disregarded
22. Example of changing the phenotype when the
original association doesn’t replicate
23. Extended sample: Schizophrenic probands (n = 325), their nonpsychotic siblings (n
= 359), and normal control subjects (n = 330). Includes many from earlier study.
“There was no significant main effect of COMT val108/158met genotype on
perseverative errors (t-score) or categories achieved on the WCST.”
Means (SDs) not reported for any tasks. Can’t compare with previous.
Focuses instead on results from N-back task
“In a series of mixed-model ANOVAs in which diagnostic group, COMT genotype,
and gender served as main effects, COMT val108/158met genotype had a
significant effect on 1-back accuracy, and a near significant effect on 0-back and 2-
back accuracy. The val-val group performed more poorly than the val-met and met-
met groups, which did not differ from each other.”
24. So the next set of studies look at N-back task….
but find no effect
25. A large sample of 402 healthy adults were tested on four working memory tests:
Spatial Delayed Response (SDR), Word Serial Position Test (WSPT), N-back, and
Letter–Number Sequencing. A subsample (n 246) was tested on the Wisconsin
Card Sorting Test
(WCST). • No COMT effect on N-back
• Trend is in opposite
direction to prediction
• Focuses on another test, and
on result with Wisconsin
card-sorting, which only
replicates original Egan et al
if Met/Val group excluded
26. High profile study reports association;
published in ‘high impact’ journal
More studies
Different
phenotype
Different
risk allele
Just in one
subgroup
In G x E
interaction
In G x G
interaction
Regarded as replications, but shifting goalposts
Problems: Focus on confirming rather than disproving hypothesis
• P-hacking/data-dredging/incomplete reporting
• Hyping of marginal findings
• Retrofitting hypothesis to fit the data (HARKing)
• Without open data, can’t replicate prior work
Problem potentially solved by replication
27. “Conclusions: Despite initially promising results, the
COMT Val158/108Met polymorphism appears to have
little if any association with cognitive function.
Publication bias may hamper attempts to understand
the genetic basis of psychological functions and
psychiatric disorders.”
Yet citations of the original work continue (now at over
2600), and new studies still attempt to build on this.
Eventually, enough studies done to merit a
meta-analysis
49 studies
29. Mainstream genetics: candidate gene studies
largely abandoned in favour of GWAS
Genome-wide association study
“The scientific breakthrough of 2007”
• HapMap: collection of SNPs covering
giving dense map of genome
• Allows consideration of all genes, rather
than just candidates
• Can then see whether distribution of
probabilities of association is as expected
by chance
• Some robust associations in field of
psychiatry/psychology, e.g. APOE for
Alzheimer’s, and genes for
smoking/alcohol use
• But problem: for many phenotypes of
interest (cognitive tests/brain measures)
can’t get large samples
29http://genomesunzipped.org/2010/07/how-to-read-a-genome-wide-association-study.php
-log(expected p-values)
-log(observedp-values)
Above line:
p-values more extreme
than expected
Q-Q plot
30. Candidate gene studies continue with
hope we’ll see stronger effects with
better phenotypes
”weak and inconsistent effects of genetic variation at the level of human
cognition, emotion, and behaviour are much more strongly associated
with imaging phenotypes.”
Bigos, K.L., Hariri, A. R. & Weinberger, D. R. (2016).
Neuroimaging genetics: OUP.
31. Candidate gene studies continue with
hope we’ll see stronger effects with
better phenotypes
”weak and inconsistent effects of genetic variation at the level of human
cognition, emotion, and behaviour are much more strongly associated
with imaging phenotypes.”
Bigos, K.L., Hariri, A. R. & Weinberger, D. R. (2016).
Neuroimaging genetics: OUP.
• So, is it true?
• Are we now seeing methodologically strong candidate gene studies?
32. 32
We took a look at recent studies published in neuroscience journals
33. Search criteria
• Nature Neuroscience
• Neuron
• Annals of Neurology
• Brain
• Molecular Psychiatry
• Biological Psychiatry
• Journal of Neuroscience
• Neurology
• Journal of Cognitive Neuroscience
• Pain
• Cerebral Cortex
• Neurolmage
• genetic OR gene OR allele
• association
• cognition OR behaviour OR
individual differences OR
endophenotype
• human, not disorders
33
35. 35
Did studies correct for multiple testing?
Total number tests conducted ranged from 2 to 368
Frequent failure to correct for:
N subgroups x N genetic models x N variants x N
phenotypes
in addition to corrections applied in imaging to adjust
for N voxels
• 8 studies fully corrected for all tests
• 9 partially corrected
• 13 did not correct
Seems to be a common failure to understand multiplicative nature of forking paths.
36. But studies ALSO ran risk of MISSING true effects
1. Low statistical power (small samples)
2. Samples with restricted range
3. Unreliable measures
37. Effect size for which there is 80% power, by N
37
80% power to detect
r = .2, N = 200
Sample size ranged from 24 to over 4600
38. 38
Largest reported effect size in 30 studies, by log N
13/30 studies underpowered to detect effect
size of r = .2
39. 39
Use of convenience samples
Regression analysis underestimates true effect if sample
has restricted range on phenotype under study.
Slope of line decreases, correlation declines.
Also: proportion with risk genotype decreases
Simulated data:
True association between genotype and phenotype in population: r = .45
N risk alleles
40. 40
Psychometric aspects of phenotype measures
Recommendation
Don’t expect an association with genotype, if the
measure doesn’t correlate well with itself on a different
occasion!
• Most published psychometric tests have information on
test-retest reliability.
• Not so for measures from experimental tasks, and for
measures of brain structure and function.
• Where reliability has been evaluated for such
measures, it does not inspire confidence.
Dubois, J., & Adolphs, R. (2016). Building a science of individual differences
from fMRI. Trends in Cognitive Sciences, 20(6), 425-443.
doi:10.1016/j.tics.2016.03.014
41. Future directions
• Huge, publicly available samples (e.g. Biobank) now
available
• Usually large N studies are limited in terms of phenotypic
measures.
• Biobank sample skewed in socioeconomic background
• Small effects -> move from individual SNPs to polygenic risk scores
• i.e. a weighted sum of many SNP effects, each with tiny influence
• Drawback – hard to understand function of SNPs
• Whole exome sequencing – identify variants with known
functional significance, so can look at networks
• But mutations/CNVs don’t always have expected effect
• E.g. may be shared with parent or sib who has no problems
• Increasing interest in interactions between mutation and genetic
background
42. Cell 169, ISSUE 7, P1177-1186, June 15, 2017
Cell 173, ISSUE 7, P1573-1580, June 14, 2018
Recommended: two recent reviews with contrasting views on
which way we should go
43. Progress in new techniques needs to be
matched by improvements in scientific
practices
Need to modify researchers’ cognitive
biases and change incentives
44. Recommendation: those still doing candidate gene
studies:
Pre-register your analysis if you want your results to be
believed
47. Registered reports solves issues of:
• Publication bias: publication decision made on the
basis of quality of introduction/methods, before
results are known
• Low power: researchers required to have 90% power
• P-hacking: analysis plan specified up-front
• HARKing: hypotheses specified up-front.
Also likely that reviewers will demand evidence that
methods are adequately reliable, and sample has
appropriate range on phenotype
Unanticipated findings can be reported but clearly
demarcated as ‘exploratory’
48. So……
should we stop looking for
genetic biomarkers for
psychological traits and
disorders?
Not necessarily, but if we’re going to do this, we need to:
• Pre-register studies
• Publish null results
• Take statistical power seriously (may require collaboration)
• Use psychometrically strong measures
• Use samples with appropriate range of phenotypes