3. Objectives
• Identify and differentiate the 2 main sources of
error in epidemiologic research: random error and
systematic error.
• Describe the relationship between sampling and
random error.
• Define confidence intervals and how to calculate
them.
4. Objectives (continued)
• Describe the relationship between sample size and
precision in a prevalence study.
• Differentiate estimation and statistical testing.
• Describe statistical testing and define key related
concepts (significant versus nonsignificant tests,
type I and type II error, statistical power).
• Explain the influence of sample size on statistical
power.
5. Sources of error in
epidemiological research
Sources of error include:
• random error (a.k.a. stochastic error)
• systematic error (a.k.a. bias)
A clear definition of bias comes from a clear
understanding of what is meant by random error—
which is why we are starting with random error.
6. PREVALENCE
• PREVALENCE is spelled in uppercase letters to
indicate that the parameter is calculated from the
population (not sampled) data.
• PREVALENCE is not an estimate: in the absence of
measurement errors, it is the true population
parameter.
7. Prevalence
• Prevalence is spelled in lowercase letters
(prevalence) to indicate that the parameter is
calculated from a sample.
• When calculated from a sample, prevalence is an
estimate: repeating the process of sampling would
result in different estimates.
• The different estimates are due to sampling
variability.
• The difference between a true value and a sample-
based estimate is a type of error: random error.
8. Random samples
• In a random sample, the selection of subjects into
the sample cannot be predicted.
• Each person’s disease status is an independent
observation that reflects true prevalence of disease
in the population through the law of large
numbers.
• The sample prevalence therefore estimates the
true value, but can differ from the true value due to
random error.
9. Sampling terminology
• In a probability sample, the probability of selecting
a person from the population is known.
• A simple random sample is a basic form of a
probability sample: the probability of selecting
each member of the population is the same.
• The probability of selection is a selection
probability.
• In practice, sampling requires a list from which to
select. This is a sampling frame.
10. Sampling terminology (continued)
• Inference describes the process of gaining
information about a population based on data
collected from a sample.
• The target population is the subject of inference: it
is the population whose parameters are estimated
through sampling.
11. Sampling terminology (continued)
• A source population is a subset of a target
population: it is a smaller population within a larger
target population from which a sample is drawn.
• A study population is common term for a sample
drawn from a source population: this is a confusing
term because a “study population” is not a
population, it’s a sample.
12. Dealing with random error
• The law of large numbers predicts that larger
samples lead to parameter estimates (e.g.,
prevalence) that more closely reflect the true
population values.
• Therefore, epidemiological studies prefer large
samples.
• Nevertheless, random error needs to be addressed
during data analysis.
13. Dealing with random error
(continued)
• There are 2 general approaches:
• confidence intervals
• statistical tests
• Confidence intervals are the preferred approach.
14. Confidence intervals
• Confidence intervals define a range of plausible
values for true population parameters, based on a
desired level of confidence.
• Usually, 95% confidence is the desired level.
• A confidence interval consists of 2 numbers called
confidence limits.
• The confidence interval comprises all values
between the lower and upper confidence limits.
• You can be 95% confident that a 95% confidence
interval captures the true population value.
15. Confidence intervals (continued)
• The best type of confidence intervals are exact
confidence intervals.
• Others are based on approximations—for example,
in a standard normal distribution, +/- 1.96 will
include 95% of values, so if an estimate is normally
distributed:
Lower 95% Confidence Limit = Estimate – (1.96 x SE)
Upper 95% Confidence Limit = Estimate + (1.96 x SE)
where SE is the standard error associated with the
estimate
16. Statistical tests
• Instead of providing a range of values, statistical
tests are designed to help answer the question, “Is
exposure associated with disease?”
• They follow a series of steps.
17. Statistical tests (continued)
• Step 1: Formulate a null hypothesis (e.g., there is
no association between exposure and disease).
• Step 2: Calculate the probability of observing an
effect as large, or larger, than observed due to
chance, assuming that the null hypothesis is true.
• Step 3: If the probability in step 2 is small, the null
hypothesis is rejected.
18. Statistical tests (continued)
• Statistical tests work by rejecting a hypothesis, not
by proving a hypothesis.
• Null hypotheses are never rejected with certainty,
they are just deemed unlikely
• The decision that a result (or one more extreme) is
unlikely is usually based on its probability (given
the null hypothesis) being less than 5% (p < 0.05).
19. Statistical errors
• Statistical tests can make 2 types of errors:
• rejecting a null assumption that is true (type I error)
• failing to reject a null assumption that is false (type II
error)
20. An association
exists in the
population
(null hypothesis is
false)
No association
exists in the
population
(null hypothesis is
true)
Statistical test is
significant
No error Type I error
Statistical test is
nonsignificant
Type II error No error
21. Statistical power
• Statistical power is the probability of rejecting a
null hypothesis that is false.
• Power is calculated from:
• sample size (larger = greater power)
• effect size (bigger = greater power)
• probability at which null rejected (larger = greater
power*)
• For continuous measures (e.g., comparing means),
the standard deviation of the outcome also
contributes to statistical power.
* but this is usually set at the conventional 5% power and not changed to increase power
22. Probability of error
The probability of type I error is:
• the value of probability at which the null is rejected
The probability of type II error is:
• 1 – statistical power