4. Inferential statistics
The part of statistics that allows researchers
to generalize their findings to a larger
population beyond data from the sample
collected.
5. Two ways to make inference
–Estimation of parameters
* Point Estimation
* Intervals Estimation
–Hypothesis Testing
6. Basic terminology
• Parameter –the numbers that describe the
charactreistics of the population(mean, sd,
varience etc)
• Statistic- The numbers that describe
characteristics of scores in the sample (mean,
variance, s.d., correlation coefficient,etc .)
8. Basic Logic
• Information from
samples is used to
estimate information
about the population.
• Statistics are used to
estimate parameters.
POPULATION
SAMPLE
PARAMETER
STATISTIC
9. Estimation
The process by which one makes inferences
about a population, based on information
obtained from a sample.
Point estimate
Interval estimate
10. Point estimate
• Point estimates are single points that estimates
parameter directly which serve as a "best guess" or "best
estimate" of an unknown population parameter
• sample proportion pˆ (“p hat”) is the point estimate of p
• sample mean x (“x bar”) is the point estimate of μ
• sample standard deviation s is the point estimate of σ
11. Problem
• iIn a health survey of 55 school boys,it was
found that the mean hemoglobin level was
10.2 g per 100 ml with a standard deviation of
2.1.Estimate the mean hemoglobin level of
the population of such school boys.
Point estimate of the population mean is 10.2
12. Disadvantages of point estimates
Point estimate do not provide
information about sample to sample
variability
How precise is x as an estimate of μ
How much can we expect x vary from
μ
14. Sampling Distribution
• Sampling Distribution: A theoretical distribution
that shows the frequency of occurrence of values
of some statistic computed for all possible
samples of size N drawn from some population.
• Sampling Distribution of the Mean: A theoretical
distribution of the frequency of occurrence of
values of the mean computed for all possible
samples of size N from a population
16. Central Limit Theorem
States that the sampling distribution of means, for
samples of 30 or more:
– Is normally distributed (regardless of the shape of the
population from which the samples were drawn)
– Has a mean equal to the population mean, “mu” regardless
of the shape population or of the size of the sample
– Has a standard deviation--the standard error of the mean--
equal to the population standard deviation divided by the
square root of the sample size
Square
root law
17.
18. Confidence interval
CI is the probability that the interval
computed from the sample data includes
the population parameter of interest
19.
20. FACTORS AFFECTING CONFIDENCE INTERVAL
Distribution of Means and Standard Error of
the Means
u
mu
+2sem-2sem +1sem-1sem-3sem +3sem
Population mean
22. Confidence limits
• The α (“alpha”) level represents the “lack of
confidence”
• (1−α)100% represent the confidence level of a
confidence interval
• Confidence interval =
• z1-α/2 instead of z1-α in this formula is because the
random error (imprecision) is split between right
and left tail
23. Z values for different confidence level
Area under the curve
24. Z table 2 tailed
Areaunderthecurve
Second decimal places
1.96=1.9+0.06
25. Process for Constructing Confidence
Intervals
• Compute the sample statistic (e.g. a mean)
• Compute the standard error of the mean
• Make a decision about level of confidence that is
desired (usually 95% or 99%)
• Find tabled value for 95% or 99% confidence
interval
• Multiply standard error of the mean by the tabled
value
• Form interval by adding and subtracting
calculated value to and from the mean
26. Problems
• iIn a health survey of 55 school boys,it was found that
the mean hemoglobin level was 10.2 g per 100 ml with a
standard deviation of 2.1.Estimate the mean
hemoglobin level of the population of such school boys.
27. Problems
• iIn a health survey of 55 school boys,it was found
that the mean hemoglobin level was 10.2 g per 100
ml with a standard deviation of 2.1.Estimate the
mean hemoglobin level of the population of such
school boys.
X =10.2 s=2.1
SE= =0.283
95% CI= 10.2-1.96 x 0.283 to 10.2+ 1.96 x 0.283
=9.6 to 10.75
99% CI= 9.47 to 10.93
28. Problem
• In a survey on hearing level of schoolchildren
with normal hearing it was found that in the
frequency 500 cycles per second,62 children
tested in the sound proof room had a mean
hearing threshold of 15.5 db with a standard
deviation of 6.5.Another 76 comparable
children who were tested in the field had a
mean threshold of 20 db with a standard
deviation of 7.1.what is the 95% confidence
interval for the difference in mean.
29. Here 2 independent samples,sound proof room tested and
field tested sample given
The confidence interval of difference in means =difference in
means +/_ 1.96 SE of difference in means
sqrt [ s2
1 / n1 + s2
2 / n2 ]
= 4.5-1.96x1.17 to 4.5+1.96x1.17= 2.21 to 6.79
SE of difference in means = Pooled SD x sqrt [1/ n1 + 1 / n2 ]
30.
31. Problemm
• In an otological examination of school children
out of 146 children examined 21 were found
to have otological abnormalities,Find the 99%
confidence interval for the proportion of
children with otological abnormalities.
32. Answer
• p=21 x 100/146 = 14.4%
• q= 85.3
• 99% CI= p +/_2.57 SE of proportion
• SE of proportion = √pq/n
33. Problem
• Find the best estimate of the mean and 95%
CI of the mean using the data
Sl no Protein value
1 6
2 7
3 8
4 6
5 8
6 7
7 6
8 7
9 8
10 6
34. • Best estimate is the mean of sample= 6.9
• Interval estimate -95% CI= x +/- t0.05 SE of x
t0.05 is found from t table with df= 9
35. • In case If 2 independent sample is given with
sample size less than 30 and difference in CI
to be found
• CI=difference in means +/_ t0.05 SE of
difference in means
t0.05 found from the t table with df = n1+n2-2
SE of difference in means = use n-1 in the
equation for pooled sd