Sampling and Sample Size

SAMPLING AND SAMPLE SIZE
Dr. Keerti Jain,
NIIT University, Neemrana

POPULATION AND SAMPLE
Population:
a set which includes all measurements
of interest to the researcher
(The collection of all responses,
measurements, or counts that are
of interest)
Sample:
A subset of the population
3/26/2020Dr. Keerti Jain, NIIT University Neemrana
2

POPULATION DEFINITION
• A population can be defined as including all people or items
with the characteristic one wishes to understand.
• Because there is very rarely enough time or money to gather
information from everyone or everything in a population, the
goal becomes finding a representative sample (or subset) of
that population.
• The population from which the sample is drawn may not be
the same as the population about which we actually want
information. Often there is large but not complete overlap
between these two groups due to frame issues etc .
3

EXAMPLE
• We might study rats in order to get a better
understanding of human health, or we might study
records from people born in 2008 in order to make
predictions about people born in 2009.
4

SAMPLING
A sample is “a smaller (but hopefully
representative) collection of units from a
population used to determine truths about that
population” (Field, 2005)
5

WHY SAMPLING?
• What is your population of interest?
• To whom do you want to generalize your
results?
• All doctors
• School children
• Indians
• Women aged 15-45 years
• Other
• Can you sample the entire population?
6

WHY SAMPLING?
• Less costs
• Less field time
• But less accuracy
• When it’s impossible to study the whole
population
7

WHEN MIGHT SAMPLE THE ENTIRE
POPULATION?
• When your population is very small
• When you have extensive resources
• When you don’t expect a very high response
8

TERMINOLOGY
Target Population:
The population to be studied/ to which the investigator wants to generalize his
results
Sampling Unit:
Smallest unit from which sample can be selected
Study Population:
The part of target population from which the investigation collect the sample
population
Sampling frame:
List of all the sampling units from which sample is drawn
Sampling scheme:
Method of selecting sampling units from sampling frame
9

SAMPLING
TARGET POPULATION
STUDY POPULATION
SAMPLE
Sample Frame
3/26/2020
Dr. Keerti Jain, NIIT University Neemrana
10

SAMPLING BREAKDOWN
11

EXAMPLE OF SAMPLING FRAME
The sampling frame is the list from which the
potential respondents are drawn
• Registrar’s office
• Class rosters
12

IMPORTANCE OF SAMPLING
FRAME
• In the most straightforward case, such as the sentencing of a batch of
material from production (acceptance sampling by lots), it is possible
to identify and measure every single item in the population and to
include any one of them in our sample. However, in the more general
case this is not possible.
• There is no way to identify all rats in the set of all rats. Where voting
is not compulsory, there is no way to identify which people will
actually vote at a forthcoming election (in advance of the election)
• As a remedy, we seek a sampling frame which has the property that
we can identify every single element and include any in our sample.
• The sampling frame must be representative of the population
13

FACTORS INFLUENCE SAMPLE
REPRESENTATIVENESS
• Sampling procedure
• Sample size
• Participation (response)
14

SAMPLING PROCESS
The sampling process comprises several stages:
• Defining the population of concern
• Specifying a sampling frame a set of items or events possible to measure
• Specifying a sampling method for selecting items or events from the frame
• Determining the sample size
• Implementing the sampling plan
• Sampling and data collecting
• Reviewing the sampling process
15

TYPES OF SAMPLING TECHNIQUES
• Non Probability Sampling
• Probability Sampling
16

NON PROBABILITY SAMPLING
17
• Probability of being chosen is unknown
• Cheaper- but unable to generalise
• potential for bias

PROBABILITY SAMPLING
• Random sampling
• Each subject has a known probability of being
selected
• Allows application of statistical
sampling theory to results to:
• Generalise
• Test hypotheses
18

TYPES OF NON-PROBABILITY
SAMPLE
• Convenience sample
• Purposive sample
• Judgmental Sampling
• Quota Sampling
• SnowBall Sampling
• Panel Sampling
19

TYPES OF PROBABILITY SAMPLING
• Simple Random Sample
• Systematic random sample
• Stratified random sample
• Multistage sample
• Multiphase sample
• Cluster sample
20

Systematic error (or bias)
Inaccurate response (information bias)
Selection bias
Sampling error (random error)
Errors in Sample
21

TYPE 1 ERROR
• The probability of finding a difference with our
sample compared to population, and there really
isn’t one….
• Known as the α (or “type 1 error”)
• Usually set at 5% (or 0.05)
22

TYPE 2 ERROR
• The probability of not finding a difference that actually
exists between our sample compared to the
population…
• Known as the β (or “type 2 error”)
• Power is (1- β) and is usually 80%
23

SAMPLE SIZE FOR ESTIMATING
POPULATION MEAN
•
24

EXAMPLE 1
• An investigator wants to estimate the mean systolic blood
pressure in children with congenital heart disease who are
between the ages of 3 and 5. How many children should be
enrolled in the study? The investigator plans on using a 95%
confidence interval (so Z=1.96) and wants a margin of error of 5
units. The standard deviation of systolic blood pressure is
unknown, but the investigators conduct a literature search and
find that the standard deviation of systolic blood pressures in
children with other cardiac defects is between 15 and 20. To
estimate the sample size, we consider the larger standard
deviation in order to obtain the most conservative (largest) sample
size.
25

SOLUTION
26
In order to ensure that the 95% confidence interval estimate of the mean
systolic blood pressure in children between the ages of 3 and 5 with
congenital heart disease is within 5 units of the true mean, a sample of size
62 is needed.

•
27
Example 2

SAMPLE SIZES FOR TWO
INDEPENDENT SAMPLES
•
28

EXAMPLE 3
•
29

EXAMPLE 4
An investigator wants to compare two diet programs in children who are
obese. One diet is a low fat diet, and the other is a low carbohydrate diet.
The plan is to enroll children and weigh them at the start of the study.
Each child will then be randomly assigned to either the low fat or the low
carbohydrate diet. Each child will follow the assigned diet for 8 weeks, at
which time they will again be weighed. The number of pounds lost will
be computed for each child. Based on data reported from diet trials in
adults, the investigator expects that 20% of all children will not complete
the study. A 95% confidence interval will be estimated to quantify the
difference in weight lost between the two diets and the investigator
would like the margin of error to be no more than 3 pounds. How many
children should be recruited into the study?
30

SOLUTION
31
Samples of size n1=56 and n2=56 will ensure that the 95% confidence interval for
the difference in weight lost between diets will have a margin of error of no more
than 3 pounds. Again, these sample sizes refer to the numbers of children with
complete data.

SAMPLE SIZE FOR ONE SAMPLE,
DICHOTOMOUS OUTCOME
(PROPORTION)
32
where p is proportion
E is sampling error or tolerable margin of error
E= difference between population proportion and sample proportion

EXAMPLE 5
It was desired to estimate proportion of anemic children in a certain
preparatory school. In a similar study at another school a proportion
of 30 % was detected.
Compute the minimal sample size required at a confidence limit of 95%
and accepting a difference of up to 4% of the true population.
SOLUTION
33

EXAMPLE 6
An investigator wants to estimate the proportion of freshmen
at his University who currently smoke cigarettes (i.e., the
prevalence of smoking). How many freshmen should be
involved in the study to ensure that a 95% confidence interval
estimate of the proportion of freshmen who smoke is within
5% of the true proportion?
34

SOLUTION
35
In order to ensure that the 95% confidence interval estimate of the
proportion of freshmen who smoke is within 5% of the true proportion, a
sample of size 385 is needed

SAMPLE SIZES FOR TWO SAMPLES,
DICHOTOMOUS OUTCOME
(PROPORTIONS)
36
E is sampling error or tolerable margin of error
E= difference between sample proportions

EXAMPLE 7
•
37

EXAMPLE 8
• An investigator wants to estimate the impact of smoking during
pregnancy on premature delivery. Normal pregnancies last
approximately 40 weeks and premature deliveries are those that occur
before 37 weeks. The 2005 National Vital Statistics report indicates that
approximately 12% of infants are born prematurely in the United
States.5 The investigator plans to collect data through medical record
review and to generate a 95% confidence interval for the difference in
proportions of infants born prematurely to women who smoked during
pregnancy as compared to those who did not. How many women
should be enrolled in the study to ensure that the 95% confidence
interval for the difference in proportions has a margin of error of no
more than 4%?
38

SOLUTION
39
The sample sizes (i.e., numbers of women who smoked and did not smoke
during pregnancy) can be computed using the formula shown above.
National data suggest that 12% of infants are born prematurely. We will use
that estimate for both groups in the sample size computation.
Samples of size n1=508 women who smoked during pregnancy and n2=508
women who did not smoke during pregnancy will ensure that the 95%
confidence interval for the difference in proportions who deliver prematurely
will have a margin of error of no more than 4%.

40

Sampling and Sample Size

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sampling and Sample Size

Similar to Sampling and Sample Size (20)

More from Dr. Keerti Jain

More from Dr. Keerti Jain (6)

Recently uploaded

Recently uploaded (20)

Sampling and Sample Size