2. POPULATION AND SAMPLE
Population:
a set which includes all measurements
of interest to the researcher
(The collection of all responses,
measurements, or counts that are
of interest)
Sample:
A subset of the population
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
2
3. POPULATION DEFINITION
• A population can be defined as including all people or items
with the characteristic one wishes to understand.
• Because there is very rarely enough time or money to gather
information from everyone or everything in a population, the
goal becomes finding a representative sample (or subset) of
that population.
• The population from which the sample is drawn may not be
the same as the population about which we actually want
information. Often there is large but not complete overlap
between these two groups due to frame issues etc .
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
3
4. EXAMPLE
• We might study rats in order to get a better
understanding of human health, or we might study
records from people born in 2008 in order to make
predictions about people born in 2009.
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
4
5. SAMPLING
A sample is “a smaller (but hopefully
representative) collection of units from a
population used to determine truths about that
population” (Field, 2005)
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
5
6. WHY SAMPLING?
• What is your population of interest?
• To whom do you want to generalize your
results?
• All doctors
• School children
• Indians
• Women aged 15-45 years
• Other
• Can you sample the entire population?
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
6
7. WHY SAMPLING?
• Less costs
• Less field time
• But less accuracy
• When it’s impossible to study the whole
population
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
7
8. WHEN MIGHT SAMPLE THE ENTIRE
POPULATION?
• When your population is very small
• When you have extensive resources
• When you don’t expect a very high response
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
8
9. TERMINOLOGY
Target Population:
The population to be studied/ to which the investigator wants to generalize his
results
Sampling Unit:
Smallest unit from which sample can be selected
Study Population:
The part of target population from which the investigation collect the sample
population
Sampling frame:
List of all the sampling units from which sample is drawn
Sampling scheme:
Method of selecting sampling units from sampling frame
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
9
12. EXAMPLE OF SAMPLING FRAME
The sampling frame is the list from which the
potential respondents are drawn
• Registrar’s office
• Class rosters
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
12
13. IMPORTANCE OF SAMPLING
FRAME
• In the most straightforward case, such as the sentencing of a batch of
material from production (acceptance sampling by lots), it is possible
to identify and measure every single item in the population and to
include any one of them in our sample. However, in the more general
case this is not possible.
• There is no way to identify all rats in the set of all rats. Where voting
is not compulsory, there is no way to identify which people will
actually vote at a forthcoming election (in advance of the election)
• As a remedy, we seek a sampling frame which has the property that
we can identify every single element and include any in our sample.
• The sampling frame must be representative of the population
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
13
15. SAMPLING PROCESS
The sampling process comprises several stages:
• Defining the population of concern
• Specifying a sampling frame a set of items or events possible to measure
• Specifying a sampling method for selecting items or events from the frame
• Determining the sample size
• Implementing the sampling plan
• Sampling and data collecting
• Reviewing the sampling process
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
15
16. TYPES OF SAMPLING TECHNIQUES
• Non Probability Sampling
• Probability Sampling
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
16
17. NON PROBABILITY SAMPLING
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
17
• Probability of being chosen is unknown
• Cheaper- but unable to generalise
• potential for bias
18. PROBABILITY SAMPLING
• Random sampling
• Each subject has a known probability of being
selected
• Allows application of statistical
sampling theory to results to:
• Generalise
• Test hypotheses
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
18
22. TYPE 1 ERROR
• The probability of finding a difference with our
sample compared to population, and there really
isn’t one….
• Known as the α (or “type 1 error”)
• Usually set at 5% (or 0.05)
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
22
23. TYPE 2 ERROR
• The probability of not finding a difference that actually
exists between our sample compared to the
population…
• Known as the β (or “type 2 error”)
• Power is (1- β) and is usually 80%
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
23
24. SAMPLE SIZE FOR ESTIMATING
POPULATION MEAN
•
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
24
25. EXAMPLE 1
• An investigator wants to estimate the mean systolic blood
pressure in children with congenital heart disease who are
between the ages of 3 and 5. How many children should be
enrolled in the study? The investigator plans on using a 95%
confidence interval (so Z=1.96) and wants a margin of error of 5
units. The standard deviation of systolic blood pressure is
unknown, but the investigators conduct a literature search and
find that the standard deviation of systolic blood pressures in
children with other cardiac defects is between 15 and 20. To
estimate the sample size, we consider the larger standard
deviation in order to obtain the most conservative (largest) sample
size.
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
25
26. SOLUTION
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
26
In order to ensure that the 95% confidence interval estimate of the mean
systolic blood pressure in children between the ages of 3 and 5 with
congenital heart disease is within 5 units of the true mean, a sample of size
62 is needed.
30. EXAMPLE 4
An investigator wants to compare two diet programs in children who are
obese. One diet is a low fat diet, and the other is a low carbohydrate diet.
The plan is to enroll children and weigh them at the start of the study.
Each child will then be randomly assigned to either the low fat or the low
carbohydrate diet. Each child will follow the assigned diet for 8 weeks, at
which time they will again be weighed. The number of pounds lost will
be computed for each child. Based on data reported from diet trials in
adults, the investigator expects that 20% of all children will not complete
the study. A 95% confidence interval will be estimated to quantify the
difference in weight lost between the two diets and the investigator
would like the margin of error to be no more than 3 pounds. How many
children should be recruited into the study?
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
30
31. SOLUTION
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
31
Samples of size n1=56 and n2=56 will ensure that the 95% confidence interval for
the difference in weight lost between diets will have a margin of error of no more
than 3 pounds. Again, these sample sizes refer to the numbers of children with
complete data.
32. SAMPLE SIZE FOR ONE SAMPLE,
DICHOTOMOUS OUTCOME
(PROPORTION)
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
32
where p is proportion
E is sampling error or tolerable margin of error
E= difference between population proportion and sample proportion
33. EXAMPLE 5
It was desired to estimate proportion of anemic children in a certain
preparatory school. In a similar study at another school a proportion
of 30 % was detected.
Compute the minimal sample size required at a confidence limit of 95%
and accepting a difference of up to 4% of the true population.
SOLUTION
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
33
34. EXAMPLE 6
An investigator wants to estimate the proportion of freshmen
at his University who currently smoke cigarettes (i.e., the
prevalence of smoking). How many freshmen should be
involved in the study to ensure that a 95% confidence interval
estimate of the proportion of freshmen who smoke is within
5% of the true proportion?
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
34
35. SOLUTION
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
35
In order to ensure that the 95% confidence interval estimate of the
proportion of freshmen who smoke is within 5% of the true proportion, a
sample of size 385 is needed
36. SAMPLE SIZES FOR TWO SAMPLES,
DICHOTOMOUS OUTCOME
(PROPORTIONS)
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
36
E is sampling error or tolerable margin of error
E= difference between sample proportions
38. EXAMPLE 8
• An investigator wants to estimate the impact of smoking during
pregnancy on premature delivery. Normal pregnancies last
approximately 40 weeks and premature deliveries are those that occur
before 37 weeks. The 2005 National Vital Statistics report indicates that
approximately 12% of infants are born prematurely in the United
States.5 The investigator plans to collect data through medical record
review and to generate a 95% confidence interval for the difference in
proportions of infants born prematurely to women who smoked during
pregnancy as compared to those who did not. How many women
should be enrolled in the study to ensure that the 95% confidence
interval for the difference in proportions has a margin of error of no
more than 4%?
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
38
39. SOLUTION
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
39
The sample sizes (i.e., numbers of women who smoked and did not smoke
during pregnancy) can be computed using the formula shown above.
National data suggest that 12% of infants are born prematurely. We will use
that estimate for both groups in the sample size computation.
Samples of size n1=508 women who smoked during pregnancy and n2=508
women who did not smoke during pregnancy will ensure that the 95%
confidence interval for the difference in proportions who deliver prematurely
will have a margin of error of no more than 4%.