Introductory lecture on some epidemiological models in causal inference, including the sufficient component cause model and the potential outcomes model.
2. Learning Outcomes
At the end of the session, the students should be able to:
1. Discuss the philosophical history of causation
2. Discuss causation in the epidemiological context
a. Hill’s criteria for causation
b. Sufficient component cause model
3. Discuss the potential outcomes model of causation
4. Discuss the requirements for causal inference
3. Causal Inference
• Determining whether a statistical association is causal
• Embedded in public health practice and policy formulation
Usual objectives:
• To identify the causes of diseases;
• To decide on the effectiveness of public health interventions
4. Philosophy of Causation
Inductivism
• The philosophy of scientific reasoning
• Making generalizations from observations to general laws of nature
Limitations pointed out by David Hume:
• This makes an assumption that certain events would, in the future,
follow the same pattern as they had in the past
• Observers cannot perceive causal connections, but only a series of
events in proximity in time and space
(Rothman, Greenland, & Lash)
5. Philosophy of Causation
Refutationism
• Scientific statements can only be found to be consistent with
observation, but cannot be proven or disproven in any logical or
mathematical sense
• In contrast, a valid observation that is inconsistent with a hypothesis
implies that the hypothesis as stated is false and so refutes the
hypothesis
• However, logical certainty about either the truth or falsity of an
internally consistent theory is impossible
(Rothman, Greenland, & Lash)
6. Philosophy of Causation
Impossibility of Scientific Proof
• Proof of causation is impossible in empirical science, including
epidemiology
• Fruits of scientific work are at best only tentative formulations of a
description of nature, even when the work itself is carried out
without mistakes
• The tentativeness of our knowledge does not prevent practical
applications, but it should keep us skeptical and critical, not only of
everyone else’s work, but of our own as well
(Rothman, Greenland, & Lash)
8. Outcome
Refers to a nonrecurrent event, such as death or first occurrence of a
disease
We will assume that the outcome of each individual is not affected by the
exposures and outcomes of other individuals
Cause
An event, condition, or characteristic that preceded the disease onset
and that, had the event, condition, or characteristic been different in a
specified way,
the disease either would not have occurred at all or would not
have occurred until some later time
(Rothman, Greenland, & Lash)
10. Hill's Criteria for Causation
Conclusion of causality is a judgment based on a body of evidence (1965)
Guidelines for judgment of causation:
1. Strength of Association
2. Biological Gradient / Dose-response relationship
3. Specificity of the Association
4. Temporality
5. Theoretical Plausibility
6. Coherence
7. Consistency
8. Experiment
(Rothman, Greenland, & Lash)
11. Strength of Association
• A strong association rules out that the association is entirely or
primarily due to unmeasured confounders or other source of modest
bias
• A strong association is neither necessary nor sufficient for causality
• A weak association is neither necessary nor sufficient for absence of
causality
(Rothman, Greenland, & Lash)
12. Consistency
• Rules out hypotheses that the association is attributable to some
factor that varies across studies
• Lack of consistency does not rule out a causal association
• Some effects are produced by their causes only under unusual circumstances
or only among a specific group of people
• Take note, NOT a question of consistency in statistical significance
• Difference in significance could arise solely because of differences in the
sample sizes of the studies
(Rothman, Greenland, & Lash)
13. Specificity
• A cause leads to a single effect, not multiple effects
• An effect has one cause, not multiple causes
• Applicable for infectious diseases
• Can be persuasive when, in addition to the causal hypothesis,
one has an alternative noncausal hypothesis that predicts a
nonspecific association
Ovarian cancer is associated only with ovarian endometriosis, and not
with any other type of endometriosis
(Rothman, Greenland, & Lash)
14. Temporality
• Required for causation BUT
• Provide no evidence for or against the hypothesis that C can cause D
in those instances in which C precedes D
• Observations in which C occurred after D merely show that C could
not have caused D in these instances
• Only if it is found that C cannot precede D in any instance can we
dispense with the causal hypothesis that C could cause D
(Rothman, Greenland, & Lash)
15. Biologic Gradient
• Linear or monotonic trend
• May still be caused by
confounding
• A non-monotonic relation
only refutes those causal
hypotheses specific enough
to predict a monotonic
dose–response curve
https://www.statisticshowto.datasciencecentral.com/monotonic-relationship/
(Rothman, Greenland, & Lash)
16. Plausibility & Coherence
• Usually based on prior beliefs or current knowledge, rather than logic
• Absence of coherent information should not be taken as evidence
against an association being considered causal
• Presence of conflicting information may indeed refute a hypothesis,
but one must always remember that the conflicting information may
be mistaken or misinterpreted
(Rothman, Greenland, & Lash)
17. Experiment
• The “experimental, or semi-experimental evidence” obtained from
reducing or eliminating a putatively harmful exposure and seeing if
the frequency of disease subsequently declines
• Can be confounded or otherwise biased by a host of concomitant
secular changes
(Rothman, Greenland, & Lash)
18. Sufficient-Component Cause Model
• Developed in 1976 by K. Rothman
• Emanated from the "web of causation“ model
• Provide a general but practical conceptual framework for causal
problems
• Describes conditions necessary to cause (and prevent) disease in a
single individual and for the epidemiological study of the causes of
disease among certain groups of individuals
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Causality/EP713_Causality_print.html
19. Sufficient-Component Cause Model
• By sufficient cause we mean a complete
causal mechanism, a minimal set of
conditions and events that are sufficient
for the outcome to occur
• Many, and possibly all, of the
components of a sufficient cause may
be unknown
Usually labeled U
20. Sufficient-Component Cause Model
• Necessary cause - a particular type of component cause that appears
in every sufficient cause
• For each component cause in a sufficient cause, the set of the other
component causes in that sufficient cause comprises the
complementary component causes
(Rothman, Greenland, & Lash)
21. Sufficient-Component Cause Model
• Component causes must be defined with respect to a clearly specified
alternative or reference condition (often called a referent)
• An event, condition, or characteristic is not a cause by itself as an
intrinsic property it possesses in isolation,
BUT as part of a causal contrast with an alternative event,
condition, or characteristic
(Rothman, Greenland, & Lash)
22. Implications of the SCC Model
1. Induction period can only be described in relation to a specific
component cause operating in a specific sufficient cause.
Induction period for the last component cause of a sufficient cause is ZERO.
2. Two component causes acting in the same sufficient cause may be
defined as interacting causally to produce disease
EXCEPT when:
E F Disease
(Rothman, Greenland, & Lash)
23. Implications of the SCC Model
3. The disease may have many possible sufficient causes, each
composed of multiple component causes. Blocking any 1 component
cause already prevents development of disease by that pathway
(Rothman, Greenland, & Lash)
24. Implications of the SCC Model
4. The strength of a factor’s effect on the occurrence of a disease in a
population depends on (a) the prevalence of its causal complement,
and (b) the prevalence of the components in other sufficient causes
Diagrams from: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Causality/EP713_Causality_print.html
25. Limitations of the SCC Model
• Does not depict aspects of the causal process such as sequence or
timing of action of the component causes, dose, or other complexities
• Does not directly help us decide which is a cause or not
• Does not illustrate confounding or bias
• At the limits of current knowledge, the remaining ‘unexplainable’
variability in disease development is ascribed to chance’
(Rothman, Greenland, & Lash)
27. Potential Outcomes Model
• The potential outcomes approach was designed to quantify the
magnitude of the causal effect of a factor on an outcome
NOT to determine whether it is actually a cause or not
• Quantitative counterfactual inference helps us predict what would
happen under different circumstances, but is agnostic in saying which
is a cause or not
• Requires commitment to define the ‘cause’ of interest
(Hernan & Robins)
28. Defining the Causal Effect
Suppose we have a ‘treatment’ variable A with two levels: 1 and 0 and
an outcome variable Y with two levels: 1 (death) and 0 (survival)
• The treatment A has a causal effect on an individual’s outcome Y if
the potential outcomes under a=1 and a=0 are different
• Thus the treatment A has a causal effect on Zeus’s outcome because:
• If he received the transplant (a=1), he dies five days later, AND
• If he did not receive the transplant (a=0), he is alive five days later
(Hernan & Robins)
29. Defining the Causal Effect
• What is generally more useful in public health is an average causal
effect in a population of individuals
To define it, we need:
• an outcome of interest
• the ‘treatments’ to be compared
• a well-defined population of individuals whose potential outcomes are
to be compared
(Hernan & Robins)
30. Measuring the
Average Causal Effect
• Half of the members of the population
(10 out of 20) would have died if they had
received a heart transplant
• Half of the members of the population
(10 out of 20) would have died if they had NOT
received a heart transplant
Null hypothesis of no average causal effect is TRUE
• In this population, having a heart transplant is
NOT a cause of death
(Hernan & Robins)
31. Missing data problem
because only 1 outcome is
factual for each person
A comparison group is used
to represent this missing
counterfactual scenario.
Central Problem of
Causal Inference
(Hernan & Robins)
32. Assumptions for Causal Inference using the
Potential Outcomes Model (in the real world)
Consistency
Exchangeability
Positivity
33. Consistency
Requires a sufficiently well-defined causal question
Assumes that either:
1. There is only 1 version of the treatment
• Everyone classified as treated received exactly the same treatment
• Unlikely in reality (Exception: perfect randomized controlled trials)
2. There are multiple versions of the treatment but the differences
among versions are not relevant to the outcome
• Can then safely group these people together as ‘exposed’
• More commonly applied, especially in observational studies
(Hernan & Robins)
34. Defining the Causal Question
Ideally, description of the causal question should include:
1. The specific factor being considered
2. The outcome expected
3. The group of people to which the hypothesis applies
4. The amount of the factor needed to cause the outcome
5. The time needed for the factor to cause the outcome
May help to imagine you’re designing a randomized controlled trial.
How would you describe the ‘treatment’ to be given?
35. Does smoking cause lung cancer?
• What type of smoking will be considered as ‘exposed’?
• Cigarette, cigar, pipe, or environmental; whether it is filtered or unfiltered
• Manner and frequency of inhalation
• Age at initiation of smoking
• Duration of smoking
• What is the referent group?
• Is it smoking nothing at all, smoking less, smoking something else?
Smoking Lung cancer
(Rothman, Greenland, & Lash)
36. Does smoking cause lung cancer?
Among adults, does cigarette smoking of at least 1 pack a day for
10 years cause lung cancer, compared to never smoking at all?
Smoking Lung cancer
37. Does obesity cause death?
• Does having a BMI of 30 cause death within 10 years, compared to
having a BMI of 20?
• Many possible ways by which an individual can have a BMI of 20
(diet, physical activity, liposuction, smoking, tuberculosis, etc)
• Are these variations irrelevant to the outcome?
Causal inference thinkers still debate on whether this is an answerable
causal question
Obesity Death
(Hernan)
38. Does having insufficient physical activity cause death?
• Does having insufficient physical activity for 10 years cause death,
compared to having sufficient physical activity for 10 years?
• Immediate implications to public health policy
Physical activity Death
39. Exchangeability
• The treated and the untreated would have experienced the same risk
of death if they had received the same treatment level
(‘Exchangeable’)
• The potential outcomes and the actual treatment received are
independent from each other
(Hernan & Robins)
40. Exchangeability
• Assumed in large randomized experiments due to randomization
• Because only chance determines treatment category for each person
• Unrealistic expectation in observational studies
• There may be variables that determine both the treatment and the outcome
(represented by L)
• BUT, exchangeability can be achieved if all these variables are measured and
controlled for
(Hernan & Robins)
41. Positivity
• The probability of receiving treatment is greater than zero (positive)
• If all the subjects receive the same treatment level, computing the average
causal effect would be impossible
• The more precise the question, the higher the risk of nonpositivity
• If conditioning on other variables (L), there should be people at all
levels of treatment in every level of L
• L, or any subset of L, cannot be too strongly associated with the treatment
• The more variables to control, the more likely to violate positivity
(Hernan & Robins)
42. References
• Hernán, MA, John Hsu and Brian Healy (2018). Data science is science’s
second chance to get causal inference right. A classification of data science
tasks . Retrieved from:
https://arxiv.org/ftp/arxiv/papers/1804/1804.10846.pdf
• Hernán, MA (2016). Does water kill? A call for less casual causal inferences.
Ann Epidemiol. 26(10): 674–680. Retrieved from:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5207342/pdf/nihms83699
5.pdf
• Hernán MA, Robins JM (2018). Causal Inference. Boca Raton: Chapman &
Hall/CRC, forthcoming. Downloaded from:
https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
• Rothman, Kenneth J., Greenland, Sander, and Lash, Timothy L. (2008)
Modern epidemiology, 3rd ed. Lippincott Williams & Wilkins.