Case control study

Seminar Presentation by: Dr. Timiresh Kumar Das
Moderator: Dr. D. K. Raut, Director Professor,
Dept. of Community Medicine, VMMC & Safdarjung Hospital

 Epidemiological study cycle
 Analytical studies: Types
 Case control vs Cohort
 Case control study
 Definitions
 History
 Design
 Outcomes
 Limitations
 Advantages and Applications
 Nested case control studies
 Selected examples of case control studies

 The sequence of events starting with

description of disease or health related event
in relation to time, place, person
searching for and finding differences in
occurrence in different populations
formulating hypotheses regarding possible
causative factors and testing them
analysing the results
results may lead to further descriptive studies
or new hypotheses.

DESCRIPTIVE STUDY
Hypothesis:
Smoking
causes Ca Lung

CASE CONTROL STUDY

• Ca Lung increasing  mostly smokers
• Death rates higher in populations with
higher per capita cigarette consumption

• Ca Lung patients and non patients

Clarifies if it was smokers who contributed
to high Ca Lung
COHORT STUDY

Ochsner,
1939

• Follows a cohort of smokers and non
smokers without Ca Lung

Doll,
1947-52

Hill, 1951
-61

•Smokers develop Ca Lung more frequently
INTERVENTIONAL TRIAL
•Proves hypothesis conclusively
(RCT)
•Gives inputs regarding other factors, control measures.

 Observational

Case control (Retrospective) studies
 Cohort (Prospective) studies


Difference in study
groups is
ONLY observed &
analysed,
NOT created
experimentally

 Experimental (Interventional):
 Animal

experiments
 Human studies
• Therapeutic trials
• Preventive trials

Difference in study
groups is
CREATED
EXPERIMENTALLY
and outcomes
observed

 Purpose: To produce a valid estimate of a

hypothesised cause-effect relationship between
suspected risk factor and disease.
Case Control Study

Cohort Study

Starts with diseased (cases)
& not diseased (controls)

Starts with not diseased but
exposed
& not exposed

Determine if 2 groups differ in exposure
to specific factor or factors

Followed up to determine difference in
rates at which disease develops in
relation to exposure

Called as case control study due to the way Called so because of the use of a “cohort”
in which study group is assembled
(a group of people who share a common
characteristic or experience)

A fourfold table

Retrospective

(Cohort)

Prospective

(Case-Control)

E
X
P
O
S
U
R
E

cases
present
present
exposed

absent
Not exposed

controls

DISEASE absent

a

b

c

d

Total

Total

Mausner, 1985

Case Control Studies

Cohort Studies

Proceeds from effect to cause

Proceeds from cause to effect

Starts with the disease

Starts with people exposed to the risk factor
or suspected cause

Tests whether the suspected cause occurs
more frequently in those with disease than
those without disease

Tests whether disease occurs more frequently
in those exposed than in those not exposed

Usually the 1st approach to the testing of
hypothesis, but also useful for exploratory
studies

Reserved for the testing of precisely
formulated hypothesis

Involves fewer study subjects

Involves larger number of subjects

Yields results relatively quickly

Long follow-up, delayed results

Suitable for study of rare diseases

Inappropriate when disease or exposure under
investigation is rare

Generally, yields only estimate of relative risk
(Odds ratio)

Yields incidence rates, relative risk,
attributable risk

Cannot yield information about disease other
than that under study

Can give information about more than one
disease outcome

Relatively inexpensive

Expensive

 Case control study synonyms:
 Case comparison study
 Case compeer study
 Case history study
 Case referent study
 Retrospective study
 Case control study definitions:
 The observational epidemiologic study of persons
with the disease (or other outcome variable) of
interest and a suitable control (comparison/
reference) group of persons without the disease.
(Dictionary of Epidemiology: 3rd ed; John M Last. 2000)

 Case control study definitions:

A study that compares two groups of people: those
with the disease or condition under study (cases) and a
very similar group of people who do not have the
disease or condition (controls). (National Institute of Health, USA)
 A case control study involves two populations – cases
and controls and has three distinct features :


 Both exposure and

outcome have occurred before the

start of the study.
 The study proceeds backwards from effect to cause.
 It uses a control or comparison group to support or refute
an inference.
(Park’s Textbook of Preventive and Social Medicine – 20th ed; K. Park. 2009)

 Case : A person in the population or study group

identified as having the particular disease, health
disorder or condition under investigation. (Dictionary
of Epidemiology: 3rd ed; John M Last. 2000)

 Control: Person or persons in a comparison

group that differs, in disease experience (or
other health related outcome) in not having
the outcome being studied. (Dictionary of Epidemiology: 3
ed; John M Last. 2000)

rd

 Bias: Any systematic error in the design,

conduct, or analysis of a study that results in
mistaken estimates of the effect of the exposure
on disease.
 Confounding: When a measure of the effect

of an exposure on risk is distorted because
of the association of exposure with other
factors that influence the outcome. It creates
data where it is not possible to separate the
contribution that any single causal factor has
made an effect.

PCA Louis (17881875) - numerical
method;
William Augustus Guy
(1843);
Baker (1862) – case
control comparisons of
marriage and fertility in
breast cancer

LANE
CLAYPON’S
BREAST
CANCER
STUDY
1926

Early beginnings

LUNG
CANCER
AND
SMOKING
1950

Establishment
and acceptance

 Six essential elements which developed

separately over time in medical hiatory
Idea of the case
 Interest in disease etiology and prevention
 Focus on individual, as opposed to group
etiologies
 Anamnesis or history taking from patients
 Grouping individual cases together into series
 Making comparisons of the differences between
groups, in order to elicit average risk at the level of
individual


 Concept found in works of Parisian physician PCA

Louis (1788-1875) - “numerical method”, a technique
whose principal tool was the tabulation of
aggregated data about patients with similar
pathologic and clinical findings.

 First explicit description by William Augustus Guy

(1843) – analysis of relationship of prior
occupational exposure and occurrence of
pulmonary consumption.

 Baker (1862) – case control comparisons of

marriage and fertility in breast cancer patients.

 Lane Claypon’s Breast cancer study 1926 -‘‘A further

report on cancer of the breast: reports on public health
and medical subjects.’’ (Lane-Claypon 1926a).
 500 hospitalised cases and 500 controls with noncancerous illnesses
 22% lower fertility in the case group.
 1950 - Four studies that implicated cigarette smoking in
cancer of the lung published in 1950 in the United States
(Levin et al 1950; Wynder & Graham 1950; Schrek et al.
1950) and in Britain (Doll & Hill 1950), have established
several features of the modern form of the case-control
study.


Doll & Hill’s study is perhaps the most well known in history.

The investigator selects
cases with the disease
and appropriate
controls without the disease
and obtains
data regarding past exposure
to possible etiologic factors in both groups.

The investigator then compares the frequency
of exposure of the two groups.

Exposed

Not Exposed

Disease

“CASES”

Exposed

Not Exposed

No Disease

“CONTROLS”

 Hallmark of Case Control Study: Starts from

cases and controls and searches for exposure.

FIRST:

Select
CASES
(With Disease)

CONTROLS
(Without Disease)

THEN:

Were exposed

a

b

Measure
Exposure

Were not exposed

c

d

a+c

b+d

TOTALS

Proportions
Exposed

a
a+c

b
b+d

 Selection of CASES:
1.

Representativeness:
Ideally, cases are a random sample of all cases of
interest in the source population (e.g. from vital data,
registry data).

More commonly they are a selection of available cases
from a medical care facility. (e.g. from hospitals, clinics)


Information: can be collected from cases themselves, or
from a respondent by proxy (relative/ friend), from
records or a combination of the above.


2. Method of Selection

Selection may be from incidence or prevalence
case:
• Incident cases are those derived from ongoing-

ascertainment of cases over time.
• Prevalent cases are derived from a cross-

sectional survey.

2. Method of Selection


Selection of INCIDENT CASES is OPTIMAL.
 These should be all newly diagnosed cases over a given
period of time in a defined population.
 However we are excluding patients who died before
diagnosis. A difficult problem ???



Prevalent cases do NOT include patients with a short course
of disease.
 So patients who recovered early and those who died will
not be included.



Additional protection against bias by including deceased
cases as well as those alive


3.

Diagnostic criteria for case studies
a) Specificity
b) Diagnostic bias
c) Validation


Diagnostic criteria regarding diagnosis of cases, types
of cases and stage of disease to be included should be
predefined.



Validity is more important than generalizability i.e. the
need to establish an etiologic relationship is more
important than to generalise results to the population.


3. Diagnostic criteria for case studies
 Example:

In a study on breast cancer – we can include all cases
OR we can include only premenopausal women with
lobular cancer.
 If we take the later group as cases; we can elicit the
etiology better.


 Selection of CONTROLS:

(i)

Should the controls be similar to the cases in
all respects other than having the disease?
i.e. COMPARABLE

(ii)

(ii) Should the controls be representative of
all non-diseased people in the population
from which the cases are selected? i.e.
REPRESENTATIVE

 Comparability

vs Representativeness

 The control group should be representative of the

general population in terms of probability of
exposure to the risk factor
 AND they should also have had the same

opportunity to be exposed as the cases have.
 Not that both cases and controls are equally
exposed; but only that they have had the same
opportunity for exposure.



Comparability vs Representativeness



Usually, cases in a case-control study are not a random
sample of all cases in the population. And if so, the
controls must be selected in the same way (and with
the same biases) as the cases.

 If follows from the above, that a pool of potential

controls must be defined. This is a universe of
people from whom controls may be selected (study
base).

 The study base is composed of a population at risk of

exposure over a period of risk of exposure.
 Cases emerge within a study base. Controls should

emerge from the same study base, except that they are
not cases.
 For example, if cases are selected exclusively from

hospitalized patients, controls must also be selected from
hospitalized patients.

“Total” Population

Reference
Population

Cases

Controls

 Selection of CONTROLS: Criteria
 Comparability is more important than

representativeness in the selection of controls
 The control should be at risk of the disease
 The control should resemble the case in all

respects except for the presence of disease
(and any as yet undiscovered risk factors for
disease)


Sources

Source

Advantage

Disadvantage

Hospital based

Easily identified.
Available for interview.
More willing to cooperate.
Tend to give complete and
accurate information
( recall bias).

Not typical of general population.
Possess more risk factors for disease.
Some diseases may share risk factors
with disease under study. (whom to
exclude???)
Berkesonian bias

Population based
(registry cases)

Most representative of the
general population.
Generally healthy.

Time, money, energy.
Opportunity of exposure may not be
same as that of cases. (locn, occu,)

Neighbourhood
controls/ Telephone
exchange random
dialing

Controls and cases similar
in residence.
Easier than sampling the
population.

Non cooperation.
Security issues.
Not representative of general
population.

Best friend control/
Sibling control

Accessible, Cooperative.
Similar to cases in most
aspects.

Overmatching.

 Selection of Controls : Number
o Large study: Cases: Control :: 1:1
o Small study: Cases: Control :: 1:2, 1:3, 1:4.
o Use of multiple controls

1. Controls of same type:
Cases: Control :: 1:1 ( for rare diseases, cases cannot be
increased in that time), ( increases power of the study).
2. Multiple controls of different types:
controls- 1 hospital, 1 neighborhood e.g. case- Children
with brain tumor, control- children with other cancer,
normal children, risk factor- h/o radiation exposure.

Children with
brain tumours

Children with
other cancers

Children without
cancer
Radiation
causes
cancers

Radiation
causes brain
cancers only

 Multiple controls of different types are valuable for exploring

alternate hypothesis & for taking into account possible
potential recall bias.



(From Gold EB, Gordis L, Tonascia J, Szklo M; Risk factors for brain tumors in children.

Am J Epidemiol 1979)

Selection of Controls: Objectives
 Elimination of selection bias - Selection
 Minimization of information bias - Blinding
 Minimization of confounding - Matching

 Problems in control selection – Confounding

variables.
Confounding variables are factors associated with the
exposure of interest and causally with the disease of
interest.
 May lead to a spurious/ biased relationship between risk
factor and disease.
 Common confounding variables are : age, sex,
educational status, socioeconomic level, etc.
 These can be adjusted by :


 Designing

the study through Matching
 Statistical techniques like Stratification and Regression

 Matching:
 Definition: It is the selection of controls so that they

are similar to the cases in specified characteristics.
(Epidemiology: An Introductory Text; Mausner & Bahn, 1985)

 Matching is defined as the process of selecting

controls so that they are similar to cases in certain
characteristics such as age, sex, race, socioeconomic
status and occupation. (Epidemiology; Leon Gordis,
2004)

 Matching:
 Matching variables (e.g. age), and matching criteria (e.g.
within the same 5 year age group) must be set up in
advance.
 Controls can be individually matched (most common) or
Frequency matched.
 Individual

matching (Matched pairs): search for one
(or more) controls who have the required matching
criteria, paired (triplet) matching is when there is one
(two) control (s) individually matched to each cases.
 Group matching (Frequency matching): select a
population of controls such that the overall
characteristics of the case, e.g. if 15% cases are under
age 20, 15% of the controls are also.

 Matching:
 Avoid over-matching, match only on factors

KNOWN to be cause of the disease.
 Obtain POWER by matching MORE THAN ONE

CONTROL per case. In general, N of controls should
be ≤ 4, because there is no further gain of power
above that.
 Obtain Generalizability by matching by matching

more than one type of control.

 Matching: Problems –
 Individual matching on too many variables – is time

consuming, costly, cumbersome and may lead to
too less controls.
 Cannot explore possible association of disease with
any variable on which cases and controls have been
matched. Therefore only factors which are known
to be associated with the disease are studied.
 Suppose we

know that breast cancer rates are higher
among single women than in married women; then
matching cases for marital status would spuriously NOT
detect any relation regarding this factor.

 Matching: Problems –


Overmatching: Matching on variables other than
those that are risk factors for the disease under study,
either in a planned manner or inadvertently.
 Example: In

a study on OCP use as a risk factor for
cancer, if we use “best friend controls”, it is most
likely that the controls would also be OCP users. In
effect we would have matched for the very factor we
want to study.

 Example: If

we use neighbourhood controls in a
study on nutrition and tuberculosis, we would be
inadvertently matching for socioeconomic status
and thus nutrition.

 Definition: Any systematic error in the

design, conduct, or analysis of a study that results
in mistaken estimates of the effect of the
exposure on disease.
 Types of bias in case control studies:
Selection bias
Information bias
Confounding bias

 Selection Bias:
 Sources –

Selective loss to follow-up
2. Incomplete ascertainment of cases (Detection or
Diagnostic bias)
3. Inappropriate control group
4. Differential motivation to participate
1.

 Selection Bias:
Selective

survival - only surviving subject
available to be studied;
those surviving differ from those dying in
potentially important ways.

Solution:

interview

:Rapid case ascertainment and

 Information Bias:
 Occurs due to 1.

2.

Imperfect definitions of study variables
OR
Flawed data collection procedures.



Leads to – Misclassification of disease and exposure.



Types of Information bias –
 Recall

bias
 Interviewer bias

 Some of the cases or controls who were actually exposed will be

erroneously classified as unexposed, and some who were actually not
exposed will be erroneously classified as exposed.—this generally
results in an underestimate of the true risk of the disease associated
with the exposure.
e.g. cervical cancer with sexual intercourse with uncircumcised men
Comparison of patients’ statements with examination findings concerning circumcision
status, Roswell Park Memorial Istitute, New York

Patients statement regarding circumcision
Examination
finding

Yes (no.)

Yes(%)

No (no.)

No(%)

circumcised

37

66.1

47

34.6

notcircumcised

19

33.9

89

65.4

Total

56

100.0

136

100.0

Recall bias (usually in case-control studies): Cases who
are aware of their disease status may be more likely to
recall exposures than controls
e.g. congenital malformation with prenatal infections
Results in misclassification

Solution
• Achieving similarity in the procedures used to
obtain information from cases and controls
• Verify exposure with existing records
• Objective measure of exposure
• Use of information recorded prior to the time
of diagnosis.

 Interviewer bias: When interviewer is not

blinded (knows) case status of subjects there
is potential for interviewer bias.


Leads to –
 If interviewer knows case status – differential
misclassification likely.
 If interviewer does not know case status – non
differential misclassification is still possible.



Solution –
 Blinding of interviewer as to case status
 Equal interview time for all participants

 Confounding: When a measure of the effect of an

exposure on risk is distorted because of the
association of exposure with other factors that
influence the outcome.
Not possible to separate the contribution that any
single causal factor has made
Confounding Factor: is one which is associated with
both exposure & disease , and is distributed unequally
in study & control groups.
 E.g.: Alcohol & Esophageal Ca ; confounding factorsmoking
 Solution: Study design : Matching
Analysis: Stratification & Regression


 On analysis of case control study we find out


Exposure rates: the frequency of exposure to
suspected risk factor in cases and in controls



Estimation of disease risk associated with exposure:
(Odds ratio)

 Exposure rates:


A case control study provides a direct estimation of the
exposure rates (frequency of exposure) to the suspected
factor in disease and non-disease groups.
Cases
(lung cancer)
Smokers
Non Smokers
TOTAL

Controls
(without lung cancer)

33 (a)

55 (b)

2 (c)

27 (d)

35 (a + c)

82 (b+d)

Doll R. and Hill AB. (1950) Brit. Med. J.



Exposure rates
 Cases

= a/ (a + c) = 33/ 35 = 94.2%
 Controls = b/ (b + d) = 55/82 = 67.0%

 Odds Ratio / Relative odds (estimate of relative

risk).


Odds: Odds of an event is defined as the ratio of the
number of ways an event can occur to the number of
ways an event cannot occur. (Epidemiology; Leon Gordis. 2004)
 If the

probability of event X occurring is P, then odds of it
occurring is = P/ 1-P.



Odds ratio: Ratio of the odds that the cases were
exposed to the odds that the controls were exposed.

 Odds ratio:


Using the four-fold table –
Diseased/ Cases

Exposed

a

Not diseased/
Controls
b

Not exposed

c

d

Odds that case was exposed


Odds ratio =
Odds that control was exposed
= (a/c)/ (b/d) = ad / bc

 Odds ratio ( = cross products ratio) can also be

viewed as the ratio of the product of the two cells

that support the hypothesis of an association (cells
a & d – diseased people who were exposed and non
diseased people who were not exposed), to the
product of the two cells which negate the
hypothesis of an association (cells b & c – non

diseased people who were exposed and diseased
people who were not exposed).

 When is Odds ratio a good estimate of the relative

risk in the population?


Cases studied are representative
 Regarding

history of exposure of all people with the
disease in the population from which cases are drawn.



Controls studied are representative
 Regarding

history of exposure of all people without the
disease in the population from which cases are drawn



When the disease being studied does NOT occur
frequently

1. Susceptible to bias if not carefully designed

2. Especially susceptible to exposure
misclassification
3. Especially susceptible to recall bias

4. Restricted to single outcome
5. Incidence rates not usually calculate
6. Cannot assess effects of matching variables

1. Only realistic study design for uncovering

etiology in rare diseases
2. Important in understanding new diseases

3. Commonly used in outbreaks

investigation
4. Useful if inducing period is long
5. Relatively inexpensive

Rare disease:
Case-control approaches are the most
efficient for rare diseases, e.g idiopathic
pulmonary fibrosis, most cancers.
Cohort approaches would require large
populations and prohibitive expense and followup time.

Case ascertainment system in place:
The conduct of a case-control study may be
facilitated by the availability of a caseascertainment system.
a) Population-based cancer registry
b) Hospital-based surveillance systems
c) Mandated disease reporting systems
When funding and time constraints are not
compatible with a cohort study.

Obtain
interviews,
blood,
urines, etc.

Study Population

TIME 1
YEARS
TIME 2

Develop
Disease

CASES

Do Not
Develop
Disease

CONTROLS
(Subgroup)

CASE-CONTROL STUDY

Consider the following hypothetical cohort:
X = lung cancer case
O = loss to follow-up

X
X
O
O
X
t1

t2
Time

t3

 Advantages:
1.

Possibility of recall bias is eliminated, since data on
exposure are obtained before disease develops.

2.

Exposure data are more likely to represent the preillness state since they are obtained years before
clinical illness is diagnosed.

3.

Costs are reduced compared to those of a
prospective study, since laboratory tests need to be
done only on specimens from subjects who are later
chosen as cases or as controls.

1950’s
Cigarette smoking and lung cancer

1970’s
Diethyl stilbestrol and vaginal adenocarcinoma
Post-menopausal estrogens and endometrial cancer
1980 ’s
Aspirin and Reyes sydrome
Tampon use and toxic shocks syndrome
L-tryptopham and eosinophilia-myalgia syndrome
AIDS and sexual practices
1990’s
Vaccine effectiveness
Diet and cancer

 Park’s Textbook of Preventive and Social Medicine

– 21st ed; Park JE. 2010.
 Mausner & Bahn Epidemiology: An Introductory
Text – 2nd ed; Mausner JS, Kramer S. 1985.
 A Dictionary of Epidemiology – 3rd ed; Last JM.
2000.
 Epidemiology – 3rd ed; Gordis L. 2004.
 Origins and early development of the case-control
study by Nigel Paneth, Ezra Susser, Mervyn
Susser. Available from www.epidemiology.ch/history/papers.

Case control study

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Case control study

Semelhante a Case control study (20)

Último

Último (20)

Case control study