TESTS OF SIGNIFICANCE

TESTS OF SIGNIFICANCE
DR. BENITA MARIA REGI
1ST MDS

CONTENTS
• History
• Introduction
• Tests of significance
• Parametric and non parametric tests
• Limitations
• Conclusion
• Reference

HISTORY
• The term statistical significance was coined
by Ronald Fisher(1890-1962).
• Student t-test : William Sealy Gosset.

Introduction to Statistical inference
• Statistical inferences are based on probabilities and as such cannot be
expressed with full certainty.
• When a test shows that a difference is statistically significant, then it simply
suggests that the difference is probably not due to chance.

• point estimation
In this, a single point (called statistic) is obtained from the sample value to
estimate the value of the population parameter without going into the error.
This estimate usually does not coincide with the true parameter.
• eg: sample standard deviation is a point estimate of population standard
deviation.
Interval estimation
A range of values or an interval is obtained which may be expected to cover
the true value of the parameter with some definite probability.

NULL HYPOTHESIS(H0)
• This step asserts that there is no real difference in the sample and the
population under consideration and the difference is accidental.
• Example- “ There is no difference in the DMF scores of the rural and urban
children”.

Alternative hypothesis(H1)
• If our sample do not support this null hypothesis, we should conclude
that something else is true. What we conclude rejecting null
hypothesis…
• Example- “there is a difference in the DMF scores of the rural and
urban children”.

Tests of significance???
• Test of significance is a formal procedure for comparing observed
data with a claim (also called a hypothesis) whose truth we want to
assess.
• A significance test uses data to evaluate a hypothesis by
comparing sample point estimates of parameters to values predicted
by the hypothesis.

 Have the observations changed with time and/or intervention?
 Do two or more groups of observations differ from each other?
 Is there an association between different observations?
Why tests of significance??

Stages in performing a test of significance:
1. State the null hypothesis of no or chance difference and the alternative.
2. Determine P , i.e., probability of occurrence of your estimate by chance
or simply accepting or rejecting your null hypothesis
3. Draw conclusion on the basis of P value, i.e., decide whether the
difference observed is due to chance or play of some external factors on
the sample.

Various tests of significance
NON
PARAMETRIC
Qualitative Quantitative

Parametric versus non-parametric tests

Parametric Tests Non parametric Tests
•Use the actual values
•Can not be used when very
less Sample is present
•Use only order or grades not
actual values
•Can be used when Very few
sample is avail

Rank sum
test a) Mann Whitney test (U test)
b) Kruskal Wallis test (H

Selection of the test
• Based on:
1. Type of data
2. Size of sample
3. Number of samples

• Used for Quantitative Data
• Used for continuous variables
PARAMETRIC TESTS

When are non-parametric tests used?
• Assumptions of parametric test are violated
• Non-normal or skewed
• Data is on an ordinal scale
• Very few observations

STUDENT’S T-TEST
Developed by Prof W.S Gossett in 1908, who published statistical
papers under the pen name of ‘Student’. Thus the test is known as
Student’s ‘t’ test.
Indications for the test:-
1. When samples are small
2. Population variance are not known.

1. Two means of small independent samples
2. Sample mean and population mean
3. Two proportions of small independent samples
Uses

1. Samples are randomly selected
2. Data utilised is Quantitative
3. Variable follow normal distribution
4. Samples are small, mostly lower than 30
Assumptions made in the use of ‘t’ test

A t-test compares the difference between two means of different
groups to determine whether that difference is statistically
significant.
Student’s ‘t’ test for different purposes
‘t’ test for one sample
‘t’ test for unpaired two samples
‘t’ test for paired two samples

ONE SAMPLE T-TEST
 When compare the mean of a single group of observations with
a specified value.

Unpaired Two sample ‘t’- test
• Unpaired t- test is used when
we wish to compare two means.
• Used when the two independent
random samples come from the
normal populations having
unknown or same variance.
Used when we have paired data of
observations from one sample only, when
each individual gives a pair of
observations.
 Same individuals are studied more than
once in different circumstances-
measurements made on the same people
before and after interventions
Eg: comparison of hb values of persons
estimated by 2 methods
Two Sample ‘t’ test
Paired two sample ‘t’ - test

Eg: the mean values of birth weight with SDs and sample sizes are given below by
socio economic status. Is the mean diff in birth weight signi. Btwn socioeco grps.?
Random samples were taken for the study.
Sl.no details High socioeconomic group Low socioeconomic group
1 Sample size n1 = 15 n2 = 10
2 Birth weight x1 = 2. 91 X2 = 2.26
3 Standard deviation SD1 =0.27 SD2 = 0.22
Null hypothesis(H0) = x1 and x2 are same
Alternative hypothesis : H1: H0 is not true
Samples are small. Variances are (SD1² = SD2²) mostly same. t test applied.

t =
𝑥1− 𝑥2
𝑆𝐸 𝑥,− 𝑥2
{SE( 𝑥1 − 𝑥2) = √ 𝑛1 − 1 𝑆𝐷1² + (𝑛2 − 1) 𝑆𝐷2
2
[
1
𝑛1
+
1
𝑛2
] = 0.1027}
t = 2.91 - 2.26 = 6.329
0. 1027
DF = (𝑛1−1) + (𝑛2 − 1)= 23
For DOF 23, t0.001 value is 3.767 from the table.
Calculated t is greater than t0.001, So null rejected, alternate accepted.

EXAMPLE 2:
• A study was carried to evaluate the effect of the new diet on weight loss. The
study population consist of 12 people have used the diet for 2 months; their
weights before and after the diet are given
Patient no. Weight (Kgs)
Before Diet After Diet
1 75 70
2 60 54
3 68 58
4 98 93
5 83 78
6 89 84
7 65 60
8 78 77
9 95 90
10 80 76
11 100 94

SD(d)/ 𝑛
t =
d
FORMULA
d = difference between x1 and x2
𝑑 = Average of d
SD = Std. deviation for the difference
n = sample size

• This test is used for testing significance difference between two
means (n>30).
• Assumptions to apply Z test
The sample must be randomly selected
Data must be quantitative
Samples should be larger than 30
Data should follow normal distribution
Sample variances should be almost the same in both the
groups of study
Z Test

To compare sample mean with population mean
To compare two sample means
To compare sample proportion with population proportion
To compare two sample proportions
Indications for Z Test

One tailed and Two tailed Z tests
• Z values on each side of mean are calculated as +Z or as -Z.
• A result larger than difference between sample mean will give +Z
and result smaller than the difference between mean will give -Z

One tailed Two tailed
If we want to know whether
one particular drug is better than
the other
The result will lie at one end or
tail of the distribution
If we want to know the action
of a particular drug is different
from that of another
The p value of an experimental
group includes both sides of
extreme results at both ends of
scale

• Find Z value
Z = Observed mean - Mean
Standard Error
• Compare calculated Z value with the value in Z table at corresponding
degree significance level.
 If the observed Z value is greater than theoretical Z value, Z is significant,
reject null hypothesis and accept alternate hypothesis

Details american indian
No. of subjects 625 625
Mean 20.5 15.5
Standard deviation 5.0 5.4
Example:
Details of study on arm circumference (cm) of American and indian preschool children are given
below. Can we infer that arm circumference is different between American and indian
children???
Null hypothesis: there is no diff between means
Alt hypothesis: there is diff in means
Z =
𝑥1− 𝑥2
(𝑆𝐸1
2+𝑆𝐸2 )
2

Standard error of ( ) = (𝑆𝐸1
2
+ 𝑆𝐸2 )
2
= [SD1² SD2 ²]
+
[ n1 n2 ]
= [5² (5.4)² ]
+
625 625
= 0.2947
Z =
𝑥1 − 𝑥2
(𝑆𝐸1
2
+ 𝑆𝐸2 )
2
= 16.985
This value is GREATER than Z (0.001)= 3.29, taken from the table.
Hence, difference is significant. Null hypothesis is rejected.

• Given by Sir Ronald Fisher
• The principle aim of statistical models is to explain the variation
in measurements.
• The statistical model involving a test of significance of the
difference in mean values of the variable between two groups is
the student’s,’t’ test. If there are more than two groups, the
appropriate statistical model is Analysis of Variance (ANOVA)
Analysis of Variance(ANOVA)

1. Sample population can be easily approximated to normal distribution.
2. All populations have same Standard Deviation.
3. Individuals in population are selected randomly.
4. Independent samples
Assumptions for ANOVA

• ANOVA compares variance by means of a simple ratio, called F-
Ratio
F= Variance between groups
Variance within groups
• The resulting F statistics is then compared with critical value of
F (critic), obtained from F tables in much the same way as was
done with ‘t’
• If the calculated value exceeds the critical value for the
appropriate level of sig., the null hypothesis will be rejected.

• A F test is therefore a test of the Ratio of Variances. F Tests can
also be used on their own, independently of the ANOVA
technique, to test hypothesis about variances.
• In ANOVA, the F test is used to establish whether a statistically
significant difference exists in the data being tested.
• ANOVA can be
ANOVA
ONEWAY
TWO
WAY

 One Way ANOVA
 If the various experimental groups differ in terms of only one
factor at a time- a one way ANOVA is used
e.g. A study to assess the effectiveness of four different antibiotics
on S Sanguis

Two Way ANOVA
 If the various groups differ in terms of two or more factors at a
time, then a Two Way ANOVA is performed
e.g. A study to assess the effectiveness of four different antibiotics
on S Sanguis in three different age groups

ANCOVA (Analysis of covariance )
• If the independent variables are categorical type then ANOVA is used.
• However if some of the independent variables are categorical and some are
continuous, then ANCOVA is appropriate.
• Example: In a study in which goal was to test the effects of anti hypertensive
drugs on systolic blood pressure ( a continuous dependent variable) and the
independent variables were age group (continuous) and treatment (a
categorical variable ).

1. To adjust the source of bias in observational studies, and also
to remove the effect of disturbing variables
2. To study regression in multiple classifications
Indications

• ANCOVA is an another ANOVA technique which combines the ANOVA with
regression to measure the differences among group means
• The advantages that ANCOVA has over other techniques are:
1. The ability to reduce the error variance in the outcome measure.
2. The ability to measure group differences after allowing for other differences
between subjects.

CORRELATION
• Relationship or association between two quantitatively measured or
continuous variables
• Eg : Height and weight , temperature and pulse , age and vital capacity , etc..
• The extent of relationship of two quantitative variables is measured by
Pearson’s correlation coefficient. It is denoted by letter ‘r’.

• Pearson’s Correlation Coefficient(r) is a measure of the strength of
the association between the two variables.
Pearson’s Correlation Coefficient
1. Subjects selected for study with pair of values of X & Y are
chosen with random sampling procedure.
2. Both X & Y variables are continuous and are assumed to follow
normal distribution
Assumptions Made in Calculation of ‘r’

The first step in studying the relationship between two continuous variables is to
draw a scatter plot of the variables to check for linearity.
Steps
However, conventionally, the independent variable is plotted on X axis and
dependent variable on Y-axis
The nearer the scatter of points is to a straight line, the higher the strength of
association between the variables.

Types of Correlation
Perfect Positive Correlation r=+1
Partial Positive Correlation 0<r<+1
Perfect negative correlation r=-1
Partial negative correlation 0>r>-1
No Correlation

REGRESSION
• statistical testing between two or more variables.
• In simple regression, we have only two variables, one variable (defined as
independent) is the cause of the behavior of another one (defined as dependent
variable).
• Regression means change in the measurements of one variable, on the positive
or negative side beyond the mean.
• Regression coefficient (b) is a measure of the change in one dependent
character (Y) with one unit change in the independent character (X).

• Correlation gives the idea about the degree and direction of relationship
between two variables , where as regression analysis enables us to predict
the values of one variable on the basis of other variable.

• Chi-square is an important continuous probability distribution, first
formulated by Helmert and then developed by Karl Pearson.
• This test is also for testing qualitative data.
• Its advantage over Z test is:
– Can be applied for smaller samples as well as for large samples.
• Prerequisites for Chi square (X2) test to be applied:
– The sample must be a random sample
– None of the observed values must be zero.
– Adequate cell size
Chi square (X2) test

1. Make a contingency table mentioning the frequencies in all cells.
2. Determine the expected value (E) in each cell.
3. Calculate the difference between observed and expected values in each cell (O-E).
Steps in Calculating (X2) value

4. Calculate X2 value for each cell
5. Sum up X2 value of each cell to get X2 value of the table.
6. Find out p value from x2 table
7. If p > 0.05 , difference is not significant, null hypothesis accepted;
If p < 0.05 , difference is significant , null hypothesis rejected.

• Consider a study done in a hospital where cases of breast cancer were compared
against controls from normal population against with a family history of Ca Breast.
• 100 in each group were studied for presence of family history.
• 25 of cases and 15 among controls had a positive family history.
• Comment on the significance of family history in breast cancer.
Example

• From the numbers, it suggests that family history is 1.66 (25/15) times more common in
Ca breast.
• So is it a risk factor in population?
• We need to test for the significance of this difference.
• We shall apply X2 test.

Step – 1: Set up a null hypothesis
– H0: “There is no significant difference between incidence of family history among
cases and controls.”
Step – 2: Define alternative hypothesis
– Ha: “Family history is 1.66 times more common in Ca breast”
Solution

1. Make a contingency table mentioning the frequencies in all cells
Step – 3: Calculating X2

2. Determine the expected value (E) for each cell.

O – observed values , E – expected value

Step – 4: Determine degree of freedom.
• DoF is determined by the formula:
DoF = (r-1) x (c-1)
where r and c are the number of rows and columns
respectively
• Here, r = c = 2.
• Hence, DoF = (2-1) x (2-1) = 1

Step – 5: Find out the corresponding P Value
– P values can be derived by using the X2 distribution tables

Step – 6: Accept or reject the Null hypothesis
• In given scenario,
X2 = 3.125
• This is less than 3.84 (for P = 0.05 at dof =1)
• Hence Null hypothesis is Accepted, i.e.,
“There is no significant difference between incidence of family history among
cases and controls”

The Sign test (for 2 repeated/correlated
measures)
• Sign test is used to find out the statistical significance of differences in matched
pair comparisons.
• The sign test is one of the simplest nonparametric tests.
• Its based on + or – signs of observations in a sample and not on their numerical
magnitudes.
• For each subject, subtract the 2nd score from the 1st, and write down the sign of
the difference.
• Test the null hypothesis that these + and – signs are values of a random variable.

Wilcoxon Matched- Pairs Signed-Ranks Test
• Used to compare the output of two objects or where subjects are
studied in context before or after experiment.
• Used to determine both direction and magnitude of difference between
matched values.
• This test is analogous to paired t- test.

• Procedure-
we first find the difference (d) between each pairs and assign
ranks to each difference and the test statistics T is calculated
• In tie situation, we assign ranks to such pairs by averaging their
ranks.

Example
• The duration of endurance of pain by eleven mice before and after administration of
a drug(adrenaline 0.4mg/20gm body-weight) . Is there a sufficient evidence in the
data to say that the drug increases the duration of endurance of pain?
BEFORE
DRUG(1)
AFTER DRUG(2) DIFF.
(2-1)
RANK OF DIFF. RANKS WITH
SIGNS
15.5
12.7
14.8
16.7
20.1
22.0
20.2
18.1
17.6
17.4
19.1
21.2
20.1
17.2
22.7
20.0
19.8
19.8
18.8
17.9
24.3
18.6
+5.7
+7.4
+2.4
+6.0
-0.1
-2.2
-0.4
+0.7
+0.3
+6.9
-0.5
8
11
7
9
1
6
3
5
2
10
4
+ 8
+11
+7
+9
-1
-6
-3
+5
+2
+10
-4

• Sum of the negative ranks = 14
• Sum of the positive ranks = 52
• The null hypothesis that the drug has no effect is tested using the smaller of the
sums of positive and of negative ranks, which in this case is 14.
• The table value of T at five percent level of significance when n=11 is 11
• The value of T in our example (14) being more than 11, we reject the null
hypothesis.

Mc Nemer Test
• One of the important non parametric tests often used when the data happen to be
nominal and relate to two related samples.
• This test is useful with before-after measurement of the same subjects
• This test tries to judge the significance of any observed change in views of the same
subject before and after the treatment.

Rank Sum Tests
• Commonly used rank sum tests includes-
the U Test and
the H Test
U Test is popular known as Wilcoxon-Mann-Whitney test,
H Test is also known as Kruskal Wallis Test

Mann- Whitney Test or (U test)
• This test is used to determine whether two independent samples have been drawn
from the same population
• It analyses the degree of separation (or the amount of overlap) between the
Experimental and Control groups.
• It is an alternative to students t-test
• It requires at least an ordinal or normal level of measurements

Mann Whitney U-test formula is given by
n1(n1+1)
U= n1n2 + --------- - R1 or R2
2
Where n1, n2 are sample sizes,
R1 and R2 are sum of ranks assigned to I and II groups

Procedure
All the observation in two samples are ranked
numerically from smallest to largest without regard the
groups
Then identify the observation for I and II samples
Sum of ranks for I and II sample determined separately
Take difference of two sum T =R1 - R2

Data for two independent groups
sample 1: 53,38,69, 57,46,39,73,48,73,74,60 and 78
sample 2: 44,40,61,52,32,44,70,41,67,72,53 and 72
test at 10% level of hypothesis that they come from population with same mean.
Example

U = n1.n2+n1(n1+1) - R1
2
= 12.12+ 12(12+1) - 167.5 = 144+78-167.5 =
54.5
2
ascending
order
32
38
39
40
41
44
44
46
48
52
53
53
57
60
61
67
69
70
72
72
73
73
74
1
2
3
4
5
6.5
6.5
8
9
10
11.5
11.5
13
14
15
16
17
18
19.5
19.5
21.5
21.5
23
B
A
A
B
B
B
B
A
A
B
B
A
A
A
B
B
A
B
B
B
A
A
A
 Illustration of direct calculation of the U statistic
R1 = 2+3+8+9+11.5+13+14+17+21.5+21.5+23+24 = 167.5
R2 = 1+4+5+6.5+6.5+10+11.5+15+16+18+19.5+19.5 = 132.5

µu = n1. n2 = 12 . 12 = 72
2 2
σu =√n1.n2 (n1+n2+1) ∕12
=√12.12 (12+12+1) ∕ 12
= 17.32
• Upper limit =µƲ + 1.64. σƲ = 72+1.64(17.32)= 100.40
• Lower limit = µƲ – 1.64. σƲ = 72-1.64( 17.32)=43.60
• as the observed value is 54.5 which is in the acceptance region, null
hypothesis is accepted.

Kruskall Wallis Test (or H test )
• This test is used to test the null hypothesis that ‘k’ independent random
samples come from identical universe against the alternative hypothesis.
• This test is analogous to the one way- ANOVA.
• It does not require assumption that the samples come from normal
populations or the universe having the same standard deviation.
where N= n1+n2……+nk

Example
• Use the kruskal – wallis test at 5% level of significance to test the null hypothesis that a profession bowler
performs equally well with the four bowling balls.
bowling results in five games
with ball no. A 271 282 257 248 262
with ball no .B 252 275 302 268 276
with ball no .C 260 255 239 246 266
with ball no .D 279 242 297 270 258

Bowling results with diff. balls and corresponding ranks
H = 12 [ 522 + 372 +752 +462 ] - 3(20+1)
20(20+1) 5 5 5 5
=(0.0285)(2362.8)-63
=67.51-63
= 4.51

• The null hypothesis that the bowler performs equally well with the four balls , x2
= 7.82 for (k-1) or 4-1= 3 degrees of freedom a 5% level of significance.
• since the calculated value of H is only 4.15 and does not exceed the x2
value, Null hypothesis is accepted.

Spearman’s rank correlation
coefficient
• Developed by Charles Spearman.
• Rank correlation is a measure of correlation that exists between two sets of ranks.
• The procedure consists of ranking the two sets of values X and Y, and computing
the difference d of each pair. The d’s are then squared and added.
6Σd2 d= difference in ranks
rs = 1 - n= number of pairs
n(n
2
-1)

Example
• Example of calculating rank correlation between fasting blood glucose level and systolic blood pressure
in 10 diabetics pt.
Serial no.
of patient
Fasting blood
glucose
level(mg/dl)
Rank
(R1)
Systolic BP
level
(mmHg)
Rank
(R2)
d=R1-R2
d2
1
2
3
4
5
6
7
8
9
10
90
92
98
112
120
121
126
132
143
145
1
2
3
4
5
6
7
8
9
10
136
140
142
130
148
135
150
170
145
165
3
4
5
1
7
2
8
10
6
9
-2
-2
-2
+3
-2
+4
-1
-2
+3
+1
4
4
4
9
4
16
1
4
9
1

rs = 1 - 10 x 99 = 1 - 0.339
= 0.661
referring the table, for n=10, at 5% level of significance we find the value greater than the
table value.
hence the null hypothesis is rejected and it is concluded that fasting blood sugar and
systolic blood pressure are correlated.
6 x 56

Kendall’s coefficient of concordance
• It is represented by symbol W.
• It determines the degree of association among several (k) sets of ranking of N
objects or individuals.
• There are only two sets of ranking N objects, we generally work out
Spearman’s coefficient of correlation, but Kendall’s coefficient of concordance
(W) is considered an appropriate measure of studying the degree of association
among three or more sets of rankings.
Ws = s s= Σ(Rj-Rj)2
1 K2 ( N3-N)
12

Example
Seven individuals have been assigned ranks by four judges at a certain music competition.
Is there a significant agreement in ranking assigned by different judges? Test at 5 % level.

Solution
Rj = ΣR j / N = 112/7 = 16
s= 332 , W= S = 332 = 332 = 332 = 0.741
1 K2 ( N3-N) 1 (4)2( 73-7) 16 (336) 448
12 12 12
Null hypothesis is rejected , there is significant agreement in ranking by different judges at 5% level of
significance

Limitations of tests of significance
 Testing of hypothesis is not decision making itself; but it helps for decision making
 Test does not explain the reasons as why the difference exists, tests do not tell about
the reason causing the difference.
 Tests are based on the probabilities and as such can not be expressed with full
certainty.
 Statistical inferences based on the significance tests can not be said to be
entirely correct evidences concerning the truth of the hypothesis.

REFERENCES
• Rao KV. Biostatistics, A manual of statistical method for use in Health, Nutrition and
Anthropology. Jaypee Publications
• Mahajan BK. Methods in biostatistics. 7 th edition. Jaypee publications
• Text book of Research methodology by C.R.Kothari.
• K. Park. Park’s text book of Preventive and Social Medicine. 19th edition.
• Essentials of preventive and community dentistry, 3rd edition by Soben Peter.

TESTS OF SIGNIFICANCE

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a TESTS OF SIGNIFICANCE

Semelhante a TESTS OF SIGNIFICANCE (20)

Último

Último (20)

TESTS OF SIGNIFICANCE

Notas do Editor