• An observational epidemiological study of persons with the disease
(or another outcome variable) of interest and a suitable control group
of persons without the disease (comparison group, reference group)
• Porta, M., ed. (2008). A Dictionary of Epidemiology (5th ed.). New York: Oxford University Press
outcome to exposure - clear definition of outcome is needed ….
retrospective for exposure, but case-ascertainment can be either
retrospective or concurrent.
almost always on outcome, with matching of controls to cases
Issues in the Study Design
1. Formulation of a clearly defined hypothesis
2. Case definition – clearly defined at the outset of the investigation to
ensure that all cases included in the study are based on the same
3. Source of cases –The source of cases needs to be clearly defined.
Selection of cases
• Cases should be homogenous
• Criteria or definition of cases must be well formulated and documented
• If diagnostic tests are used to identify cases:
- Low-sensitivity tests & high specificity, will result in a lower number
of false positives
• If cases are misclassified (include false positives), the association may be
Sources of cases
• Ideally, cases are a random sample of all cases of interest in the source
population (e.g. from vital data, registry data).
• More commonly they are a selection of available cases from a medical
care facility (e.g. from hospitals , clinics )
• Population based case-control studies are generally more expensive and
difficult to conduct
Characteristics of controls
• If cases are a random sample of all cases in the population, then controls should be
a random sample of all non-cases in the population sampled at the same time
(i.e. from the same study base)
• If study cases are not a random sample of the university, it is not likely that a
random sample of the population of non-cases will constitute a good control
• Comparability is more important than representativeness in the
selection of controls
• The control must be at risk of getting the disease.
• The control should resemble the case in all respects except for the
presence of disease
COMPARABILITY vs . REPRESENTATIVENESS
• Usually, study cases are not a random sample of all cases in the population, and
therefore controls must be selected so as to mirror the same biases that entered
into the selection of cases
• It follows from the above that a pool of potential controls must be defined.
• This pool must mirror the study base of the cases
• The study base is composed of a population at risk of exposure over a period of
risk of exposure.
• Cases emerge within a study base. Controls should emerge from the same
study base, except that they are not cases.
• If cases are selected exclusively from hospitalized patients, controls must also be
selected from hospitalized patients
• If cases must have gone through a certain ascertainment process (e.g. screening),
controls must have also (e.g. mammogram-detected breast cancer)
• If cases must have reached a certain age before they can become cases, so must
• If the exposure of interest is cumulative over time, the controls and cases must
each have the same opportunity to be exposed to that exposure.
• Example - if the case has to work in a factory to be exposed to benzene, the
control must also have worked where he/she could be exposed to benzene
Six issues in matching controls in case-control studies
1. Identify the pool from which controls may come. This pool is likely to reflect
the way controls were ascertained (hospital, screening test, telephone survey).
2. Control selection is usually through matching.
Matching variables (e.g. age), and matching criteria (e.g. control must be
within the same 5 year age group) must be set up in advance.
Controls can be individually matched or frequency matched
INDIVIDUAL MATCHING: search for one (or more) controls who
have the required MATCHING CRITERIA. PAIRED or TRIPLET MATCHING is
when there is one or two controls individually matched to each case.
FREQUENCY MATCHING: select a population of controls such that
the overall characteristics of the group match the overall characteristics of the
cases. e.g. if 15% of cases are under age 20, 15% of the controls are also.
Avoid over-matching. Match only on factors known to be causes of the disease
Obtain POWER by matching more than one control per case. In general, N of
controls should be < 4, because there is no further gain of power above four
controls per case.
Obtain GENERALIZABILITY by matching more than ONE TYPE OF
• Measurement of exposure is established after the development of
disease and as a result is prone to both recall and observer bias.
• The procedures used for the collection of exposure data should be the
same for cases and controls.
Various methods can be used to ascertain exposure status. These
• –Standardized questionnaires
• –Biological samples
• –Interviews with the subject
• –Interviews with spouse or other family members
• –Medical records
• –Employment records
• –Pharmacy records
• Measure of association between an exposure and an outcome.
• Odds that an outcome will occur given a particular exposure, compared to the
odds of the outcome occurring in the absence of that exposure.
• Most commonly used in case-control studies
• Can also be used in cross-sectional and cohort study designs as well (with
some modifications and/or assumptions).
• The 95% confidence interval (CI) is used to estimate the precision of the OR.
• A large CI indicates a low level of precision of the OR, whereas a small CI indicates
a higher precision of the OR.
• 95% CI is often used as a proxy for the presence of statistical significance if it
does not overlap the null value (e.g. OR=1)
• Presence of a positive OR for an outcome given a particular exposure does not
necessarily indicate that this association is statistically significant.
• The factors affecting the width of the CI include the desired confidence level, the
sample size and the variability in the sample.
ODD ratio≠ RR
• If the disease condition (event) is rare, then the odds ratio and
relative risk may be comparable .
• But the odds ratio will overestimate the risk if the disease is more
• In such cases, the odds ratio should be avoided, and the relative risk
will be a more accurate estimation of risk.
Clinical vs statistical significance
1. Clinical importance is best inferred by looking at the effect size, that is how
much is the actual change or difference.
2. However, statistical significance in terms of P only suggests whether there is
any difference in probability terms
3. One way to combine statistical significance and effect sizes is to report CIs.
4. If a corresponding hypothesis test is performed, the confidence level is the
complement of the level of significance, that is a 95% CI reflects a significance
level of 0.05, while at the same time providing an estimate of the ‘true’ value.
1. Only realistic study design for uncovering etiology in rare diseases
2. Important in understanding new diseases
3. Commonly used in outbreak investigation
4. Useful if induction period is long
5. Relatively inexpensive
1. Susceptible to bias if not carefully designed
2. Especially susceptible to exposure misclassification
3. Especially susceptible to recall bias
4. Restricted to single outcome
5. Incidence rates not usually calculable
6. Cannot assess effects of matching variables
What is bias and confounding ?
Any systemic error in the design, conduct or analysis of a study
that results in a mistaken estimate of an exposure’s effect on the risk of
disease is called BIAS
A third variable or a mediator variable, can adversely affect the
relation between exposure and outcome is called Confounding
3 important biases in case control study
• Selection bias
• Recall /information bias
• Confounding bias
Prevention of confounding
• Before study -
• Randomization (Intervention study )
• Restriction ( Cohort, Case-control)
• Matching (Case-control)
After study –
• Multivariate analysis
• A method that limits participation in the study to individuals who are
similar in relation to the confounder
• Problem – 1. Reduces eligible population , 2. Limits generalizability (
external validity )
• Controls and cases are similar in variables, which may be related to the topic of
studying BUT are not of interest in themselves.
Table 4: Adjusted and unadjusted odds ratio of variables for the presence of diabetic
SOME IMPORTANT DISCOVERIES MADE IN CASE CONTROL STUDIES
• Cigarette smoking and lung cancer
• Diethyl stilbestrol and vaginal adenocarcinoma
• Post-menopausal estrogens and endometrial cancer
1. Aspirin and Reyes syndrome
2. Tampon use and toxic shock syndrome
3. L-tryptophan and eosinophilia-myalgia syndrome
4. AIDS and sexual practices
1. Vaccine effectiveness
2. Diet and cancer
An issue / hypothesis can be ‘focused’ In terms of
• The population studied
• Whether the study tried to detect a beneficial
or harmful effect
• Risk factors studied
1 . Did the study address a clearly focused issue?
2. Did the authors use an appropriate method to answer their
• Is a case control study an appropriate way of answering the
question under the circumstances
• Did it address the study question
3. Were the cases recruited in an acceptable way?
1. looking for selection bias which might compromise validity of the findings
2. Are the cases defined precisely ?
3. were the cases representative of a defined population (geographically
4. Established reliable system for selecting all the cases
5. Incident or prevalent
6. Something special about the cases
7. Time frame of the study relevant to disease/exposure
8. Sufficient number of cases selected
9. Power calculation
4. Were the controls selected in an acceptable way?
• Looking for selection bias which might compromise the generalizability of
• Were the controls representative of the defined population
(geographically and/or temporally)
• Something special about the controls
• Matched, population based or randomly selected
• Was there a sufficient number of controls selected
5. Was the exposure accurately measured to minimise bias?
1. Looking for measurement, recall or classification bias
2. Was the exposure clearly defined and accurately measured
3. Did the authors use subjective or objective measurements
4. Do the measures truly reflect what they are supposed to measure (have they been
5. Were the measurement methods similar in the cases and controls
6. Did the study incorporate blinding where feasible
7. Is the temporal relation correct (does the exposure of interest precede the outcome)
• 6.(a) Aside from the experimental intervention, were the groups
= author may have missed • genetic • environmental • socio-economic
• (b) Have the authors taken account of the potential confounding
factors in the design and/or in their analysis?
= Look for • restriction in design, and techniques e.g. modelling,
stratified-, regression-, or sensitivity analysis to correct, control or
adjust for confounding factors
1. what are the bottom line results
2. Is the analysis appropriate to the design
3. How strong is the association between exposure and outcome (look
at the odds ratio)
4. Are the results adjusted for confounding, and might confounding
still explain the association
5. Has adjustment made a big difference to the OR
7 . How large was the treatment effect?
8. How PRECISE was the estimate of the treatment effect?
• size of the p-value
• size of the confidence intervals
• have the authors considered all the important variables
• how was the effect of subjects refusing to participate evaluated
9. Do you believe the results?
1. big effect is hard to ignore!
2. Can it be due to chance, bias, or confounding
3. are the design and methods of this study sufficiently flawed to make the results
4. consider Bradford Hills criteria (e.g. time sequence, does-response gradient,
strength, biological plausibility)
10. Can the results be applied to the local population?
• The subjects covered in the study could be sufficiently different from
our population to cause concern
• Local setting is likely to differ much from that of the study
• Can the local benefits and harms be quantified
11. Do the results of this study fit with other available evidence?
• All the available evidence from RCT’s Systematic Reviews, Cohort
Studies, and Case Control Studies as well, for consistency
• In observational study, precision determined by sample size and the
efficiency of the study.
• larger study and one with more balanced groups - more precise
• A large standard deviation relative to the estimate indicates low
• wide confidence intervals for estimates of association (e.g., odds
ratios or relative risks) indicate low precision.
• Whereas precision is a lack of random error, validity refers to a lack of
• Internal validity refers to the strength of the inferences from the study
• For internal validity , conclusions can be logically drawn from the results
produced by an appropriate methodology.
• External validity is the ability to generalize study results to a more universal
Other check lists
• Joanne Briggs Checklist for Case Control Studies
• SIGN Case-Control Studies Checklist
• STROBE Checklist
Take home message
• One observational study like case control study rarely provides
sufficiently robust evidence to recommend changes to clinical
practice or within health policy decision making.
• However, for certain questions observational studies provide the only
• Recommendations from observational studies are always stronger
when supported by other evidence
• Explaining odds ratios., Szumilas M,, Journal of the Canadian Academy of Child and Adolescent Psychiatry, 2010 Aug
• Understanding relative risk, odds ratio, and related terms: as simple as it can get., Andrade C,, The Journal of clinical psychiatry, 2015 Jul]
• Cummings P, The relative merits of risk ratios and odds ratios. Archives of pediatrics Grant RL, Converting an odds ratio to a range of
plausible relative risks for better communication of research findings. BMJ (Clinical research ed.). 2014 Jan 24;
• THE LANCET • Vol 359 • February 2, 2002
• Lancet2005; 365: 1429–33
• Critical Appraisal Skills Programme (2018). CASP (insert name of checklist i.e. Case Control Study) Checklist. [online] Available at: URL.
Accessed: Date Accessed