Research Methodology Module-05

Kishor Ade
Kishor AdeAssistant Professor, Army Institute of Technology, Pune em Lecturer at VIIT College Pune

Research Methodology Module-05

1
Module 5 RM
Preliminary data analysis
1) TESTING OF HYPOTHESIS CONCEPTS AND TESTING
HYPOTHESIS TESTING
DEFINITION
Hypothesis tests are procedures for making rational decisions about the reality of effects.
Rational Decisions
Most decisions require that an individual select a single alternative from a number of
possible alternatives. The decision is made without knowing whether or not it is correct;
that is, it is based on incomplete information. For example, a person either takes or does
not take an umbrella to school based upon both the weather report and observation of
outside conditions. If it is not currently raining, this decision must be made with
incomplete information.
A rational decision is characterized by the use of a procedure which insures the likelihood
or probability that success is incorporated into the decision-making process. The
procedure must be stated in such a fashion that another individual, using the same
information, would make the same decision.
One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is
stranded on a planet without his communicator and is unable to get back to the
Enterprise. Spock has assumed command and is being attacked by Klingons (who else).
Spock asks for and receives information about the location of the enemy, but is unable to
act because he does not have complete information. Captain Kirk arrives at the last
moment and saves the day because he can act on incomplete information.
This story goes against the concept of rational man. Spock, being the ultimate rational
man, would not be immobilized by indecision. Instead, he would have selected the
alternative which realized the greatest expected benefit given the information available. If
complete information were required to make decisions, few decisions would be made by
rational men and women. This is obviously not the case. The script writer misunderstood
Spock and rational man.
Effects
When a change in one thing is associated with a change in another, we have an effect.
The changes may be either quantitative or qualitative, with the hypothesis testing
procedure selected based upon the type of change observed. For example, if changes in
salt intake in a diet are associated with activity level in children, we say an effect
occurred. In another case, if the distribution of political party preference (Republicans,
Democrats, or Independents) differs for sex (Male or Female), then an effect is present.
Much of the behavioral science is directed toward discovering and understanding effects.
The effects discussed in the remainder of this text appear as various statistics including:
differences between means, contingency tables, and correlation coefficients.
2
GENERAL PRINCIPLES
All hypothesis tests conform to similar principles and proceed with the same sequence of
events.
 A model of the world is created in which there are no effects. The experiment is
then repeated an infinite number of times.
 The results of the experiment are compared with the model of step one. If, given
the model, the results are unlikely, then the model is rejected and the effects are
accepted as real. If, the results could be explained by the model, the model must be
retained. In the latter case no decision can be made about the reality of effects.
Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That
is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true.
If it is shown that this assumption is logically impossible, then the original hypothesis is
proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is
decided that the model of no effects is unlikely enough that the opposite hypothesis, that
of real effects, must be true.
An analogous situation exists with respect to hypothesis testing in statistics. In hypothesis
testing one wishes to show real effects of an experiment. By showing that the
experimental results were unlikely, given that there were no effects, one may decide that
the effects are, in fact, real. The hypothesis that there were no effects is called the NULL
HYPOTHESIS. The symbol H0 is used to abbreviate the Null Hypothesis in statistics.
Note that, unlike geometry, we cannot prove the effects are real, rather we may decide the
effects are real.
For example, suppose the following probability model (distribution) described the state of
the world. In this case the decision would be that there were no effects; the null
hypothesis is true.
Event A might be considered fairly likely, given the
above model was correct. As a result the model would be retained, along with the NULL
HYPOTHESIS. Event B on the other hand is unlikely, given the model. Here the model
would be rejected, along with the NULL HYPOTHESIS.
The Model
The SAMPLING DISTRIBUTION is a distribution of a sample statistic. It is used as a
model of what would happen if
1.) The null hypothesis were true (there really were no effects), and
2.) The experiment was repeated an infinite number of times.
Because of its importance in hypothesis testing, the sampling distribution will be
discussed in a separate chapter.
3
Probability
Probability is a theory of uncertainty. It is a necessary concept because the world
according to the scientist is unknowable in its entirety. However, prediction and decisions
are obviously possible. As such, probability theory is a rational means of dealing with an
uncertain world.
Probabilities are numbers associated with events that range from zero to one (0-1). A
probability of zero means that the event is impossible. For example, if I were to flip a
coin, the probability of a leg is zero, due to the fact that a coin may have a head or tail,
but not a leg. Given a probability of one, however, the event is certain. For example, if I
flip a coin the probability of heads, tails, or an edge is one, because the coin must take
one of these possibilities.
In real life, most events have probabilities between these two extremes. For instance, the
probability of rain tonight is .40; tomorrow night the probability is .10. Thus it can be
said that rain is more likely tonight than tomorrow.
The meaning of the term probability depends upon one's philosophical orientation. In the
CLASSICAL approach, probabilities refer to the relative frequency of an event, given the
experiment was repeated an infinite number of times. For example, the .40 probability of
rain tonight means that if the exact conditions of this evening were repeated an infinite
number of times, it would rain 40% of the time.
In the Subjective approach, however, the term probability refers to a "degree of belief."
That is, the individual assigning the number .40 to the probability of rain tonight believes
that, on a scale from 0 to 1, the likelihood of rain is .40. This leads to a branch of
statistics called "BAYESIAN STATISTICS." While many statisticians take this
approach, it is not usually taught at the introductory level. At this point in time all the
introductory student needs to know is that a person calling themselves a "Bayesian
Statistician" is not ignorant of statistics. Most likely, he or she is simply involved in the
theory of statistics.
No matter what theoretical position is taken, all probabilities must conform to certain
rules. Some of the rules are concerned with how probabilities combine with one another
to form new probabilities. For example, when events are independent, that is, one doesn't
effect the other, the probabilities may be multiplied together to find the probability of the
joint event. The probability of rain today AND the probability of getting a head when
flipping a coin is the product of the two individual probabilities.
A deck of cards illustrates other principles of probability theory. In bridge, poker,
rummy, etc., the probability of a heart can be found by dividing thirteen, the number of
hearts, by fifty-two, the number of cards, assuming each card is equally likely to be
drawn. The probability of a queen is four (the number of queens) divided by the number
of cards. The probability of a queen OR a heart is sixteen divided by fifty-two. This
figure is computed by adding the probability of hearts to the probability of a queen, and
then subtracting the probability of a queen AND a heart which equals 1/52.
An introductory mathematical probability and statistics course usually begins with the
principles of probability and proceeds to the applications of these principles. One
4
problem a student might encounter concerns unsorted socks in a sock drawer. Suppose
one has twenty-five pairs of unsorted socks in a sock drawer. What is the probability of
drawing out two socks at random and getting a pair? What is the probability of getting a
match to one of the first two when drawing out a third sock? How many socks on the
average would need to be drawn before one could expect to find a pair? This problem is
rather difficult and will not be solved here, but is used to illustrate the type of problem
found in mathematical statistics.
Hypothesis Testing
Hypothesis testing is the use of statistics to determine the probability that a given
hypothesis is true. The usual process of hypothesis testing consists of four steps.
1. Formulate the null hypothesis (commonly, that the observations are the result of pure
chance) and the alternative hypothesis (commonly, that the observations show a real
effect combined with a component of chance variation).
2. Identify a test statistic that can be used to assess the truth of the null hypothesis.
3. Compute the P-value, which is the probability that a test statistic at least as significant
as the one observed would be obtained assuming that the null hypothesis were true. The
smaller the -value, the stronger the evidence against the null hypothesis.
4. Compare the -value to an acceptable significance value (sometimes called an alpha
value). If , that the observed effect is statistically significant, the null hypothesis is
ruled out, and the alternative hypothesis is valid.
2) ANALYSIS OF VARIANCE TECHNIQUES
An important technique for analyzing the effect of categorical factors on a response is to
perform an Analysis of Variance. An ANOVA decomposes the variability in the response
variable amongst the different factors. Depending upon the type of analysis, it may be
important to determine: (a) which factors have a significant effect on the response, and/or
(b) how much of the variability in the response variable is attributable to each factor.
STATGRAPHICS Centurion provides several procedures for performing an analysis of
variance:
1. One-Way ANOVA - used when there is only a single categorical factor. This is
equivalent to comparing multiple groups of data.
2. Multifactor ANOVA - used when there is more than one categorical factor, arranged in
a crossed pattern. When factors are crossed, the levels of one factor appear at more than
one level of the other factors.
3. Variance Components Analysis - used when there are multiple factors, arranged in a
hierarchical manner. In such a design, each factor is nested in the factor above it.
4. General Linear Models - used whenever there are both crossed and nested factors,
when some factors are fixed and some are random, and when both categorical and
quantitative factors are present.
5
One-Way ANOVA
A one-way analysis of variance is used when the data are divided into groups according
to only one factor. The questions of interest are usually: (a) Is there a significant
difference between the groups?, and (b) If so, which groups are significantly different
from which others? Statistical tests are provided to compare group means, group
medians, and group standard deviations. When comparing means, multiple range tests are
used, the most popular of which is Tukey's HSD procedure. For equal size samples,
significant group differences can be determined by examining the means plot and
identifying those intervals that do not overlap.
Multifactor ANOVA
When more than one factor is present and the factors are crossed, a multifactor ANOVA
is appropriate. Both main effects and interactions between the factors may be estimated.
The output includes an ANOVA table and a new graphical ANOVA from the latest
edition of Statistics for Experimenters by Box, Hunter and Hunter (Wiley, 2005). In a
graphical ANOVA, the points are scaled so that any levels that differ by more than
exhibited in the distribution of the residuals are significantly different.
6
Variance Components Analysis
A Variance Components Analysis is most commonly used to determine the level at which
variability is being introduced into a product. A typical experiment might select several
batches, several samples from each batch, and then run replicates tests on each sample.
The goal is to determine the relative percentages of the overall process variability that is
being introduced at each level.
7
General Linear Model
The General Linear Models procedure is used whenever the above procedures are not
appropriate. It can be used for models with both crossed and nested factors, models in
which one or more of the variables is random rather than fixed, and when quantitative
factors are to be combined with categorical ones. Designs that can be analyzed with the
GLM procedure include partially nested designs, repeated measures experiments, split
plots, and many others. For example, pages 536-540 of the book Design and Analysis of
Experiments (sixth edition) by Douglas Montgomery (Wiley, 2005) contains an example
of an experimental design with both crossed and nested factors. For that data, the GLM
procedure produces several important tables, including estimates of the variance
components for the random factors.
Analysis of Variance (ANOVA)
Purpose
The reason for doing an ANOVA is to see if there is any difference between groups on
some variable.
For example, you might have data on student performance in non-assessed tutorial
exercises as well as their final grading. You are interested in seeing if tutorial
performance is related to final grade. ANOVA allows you to break up the group
according to the grade and then see if performance is different across these grades.
ANOVA is available for both parametric (score data) and non-parametric
(ranking/ordering) data.
Types of ANOVA
One-way between groups
The example given above is called a one-way between groups model.
You are looking at the differences between the groups.
There is only one grouping (final grade) which you are using to define the groups.
This is the simplest version of ANOVA.
This type of ANOVA can also be used to compare variables between different groups -
tutorial performance from different intakes.
One-way repeated measures
A one way repeated measures ANOVA is used when you have a single group on which
you have measured something a few times.
For example, you may have a test of understanding of Classes. You give this test at the
beginning of the topic, at the end of the topic and then at the end of the subject.
You would use a one-way repeated measures ANOVA to see if student performance on
the test changed over time.
Two-way between groups
A two-way between groups ANOVA is used to look at complex groupings.
8
For example, the grades by tutorial analysis could be extended to see if overseas students
performed differently to local students. What you would have from this form of ANOVA
is:
The effect of final grade
The effect of overseas versus local
The interaction between final grade and overseas/local
Each of the main effects are one-way tests. The interaction effect is simply asking "is
there any significant difference in performance when you take final grade and
overseas/local acting together".
Two-way repeated measures
This version of ANOVA simple uses the repeated measures structure and includes an
interaction effect.
In the example given for one-way between groups, you could add Gender and see if there
was any joint effect of gender and time of testing - i.e. do males and females differ in the
amount they remember/absorb over time.
Non-parametric and Parametric
ANOVA is available for score or interval data as parametric ANOVA. This is the type
of ANOVA you do from the standard menu options in a statistical package.
The non-parametric version is usually found under the heading "Nonparametric test". It
is used when you have rank or ordered data.
You cannot use parametric ANOVA when you data is below interval measurement.
Where you have categorical data you do not have an ANOVA method - you would have
to use Chi-square which is about interaction rather than about differences between
groups.
How it’s done
What ANOVA looks at is the way groups differ internally versus what the difference is
between them. To take the above example:
1. ANOVA calculates the mean for each of the final grading groups (HD, D, Cr, P,
N) on the tutorial exercise figure - the Group Means.
2. It calculates the mean for all the groups combined - the Overall Mean.
3. Then it calculates, within each group, the total deviation of each individual's score
from the Group Mean - Within Group Variation.
4. Next, it calculates the deviation of each Group Mean from the Overall Mean -
Between Group Variation.
5. Finally, ANOVA produces the F statistic which is the ratio Between Group
Variation to the Within Group Variation.
If the Between Group Variation is significantly greater than the Within Group
Variation, then it is likely that there is a statistically significant difference between the
groups.
The statistical package will tell you if the F ratio is significant or not.
All versions of ANOVA follow these basic principles but the sources of Variation get
more complex as the number of groups and the interaction effects increase.
9
3) INTRODUCTION TO NON PARAMETRIC TESTS
Introduction to Nonparametric Testing
This module will describe some popular nonparametric tests for continuous outcomes.
Interested readers should see Conover3
for a more comprehensive coverage of
nonparametric tests.
Key Concept:
Parametric tests are generally
more powerful and can test a
wider range of alternative
hypotheses. It is worth repeating that if data
are approximately normally distributed then
parametric tests (as in the modules on
hypothesis testing) are more appropriate.
However, there are situations in which
assumptions for a parametric test are
violated and a nonparametric test is more
appropriate.
The techniques described here apply to outcomes that are ordinal, ranked, or continuous
outcome variables that are not normally distributed. Recall that continuous outcomes are
quantitative measures based on a specific measurement scale (e.g., weight in pounds,
height in inches). Some investigators make the distinction between continuous, interval
and ordinal scaled data. Interval data are like continuous data in that they are measured
on a constant scale (i.e., there exists the same difference between adjacent scale scores
across the entire spectrum of scores). Differences between interval scores are
interpretable, but ratios are not. Temperature in Celsius or Fahrenheit is an example of an
interval scale outcome. The difference between 30º and 40º is the same as the difference
between 70º and 80º, yet 80º is not twice as warm as 40º. Ordinal outcomes can be less
specific as the ordered categories need not be equally spaced. Symptom severity is an
example of an ordinal outcome and it is not clear whether the difference between much
worse and slightly worse is the same as the difference between no change and slightly
improved. Some studies use visual scales to assess participants' self-reported signs and
symptoms. Pain is often measured in this way, from 0 to 10 with 0 representing no pain
and 10 representing agonizing pain. Participants are sometimes shown a visual scale such
as that shown in the upper portion of the figure below and asked to choose the number
10
that best represents their pain state. Sometimes pain scales use visual anchors as shown in
the lower portion of the figure below.
Visual Pain Scale
In the upper portion of the figure, certainly 10 is worse than 9, which is worse than 8;
however, the difference between adjacent scores may not necessarily be the same. It is
important to understand how outcomes are measured to make appropriate inferences
based on statistical analysis and, in particular, not to overstate precision.
Assigning Ranks
The nonparametric procedures that we describe here follow the same general procedure.
The outcome variable (ordinal, interval or continuous) is ranked from lowest to highest
and the analysis focuses on the ranks as opposed to the measured or raw values. For
example, suppose we measure self-reported pain using a visual analog scale with anchors
at 0 (no pain) and 10 (agonizing pain) and record the following in a sample of n=6
participants:
7 5 9 3 0 2
The ranks, which are used to perform a nonparametric test, are assigned as follows: First,
the data are ordered from smallest to largest. The lowest value is then assigned a rank of
1, the next lowest a rank of 2 and so on. The largest value is assigned a rank of n (in this
example, n=6). The observed data and corresponding ranks are shown below:
Ordered Observed Data: 0 2 3 5 7 9
Ranks: 1 2 3 4 5 6
A complicating issue that arises when assigning ranks occurs when there are ties in the
sample (i.e., the same values are measured in two or more participants). For example,
suppose that the following data are observed in our sample of n=6:
11
Observed Data: 7 7 9 3 0 2
The 4th
and 5th
ordered values are both equal to 7. When assigning ranks, the
recommended procedure is to assign the mean rank of 4.5 to each (i.e. the mean of 4 and
5), as follows:
Ordered Observed
Data:
0.52.53.5 7 7 9
Ranks: 1.52.53.54.54.56
Suppose that there are three values of 7. In this case, we assign a rank of 5 (the mean of
4, 5 and 6) to the 4th
, 5th
and 6th
values, as follows:
Ordered Observed Data: 0 2 3 7 7 7
Ranks: 1 2 3 5 5 5
Using this approach of assigning the mean rank when there are ties ensures that the sum
of the ranks is the same in each sample (for example, 1+2+3+4+5+6=21,
1+2+3+4.5+4.5+6=21 and 1+2+3+5+5+5=21). Using this approach, the sum of the ranks
will always equal n(n+1)/2. When conducting nonparametric tests, it is useful to check
the sum of the ranks before proceeding with the analysis.
To conduct nonparametric tests, we again follow the five-step approach outlined in the
modules on hypothesis testing.
1. Set up hypotheses and select the level of significance α. Analogous to parametric
testing, the research hypothesis can be one- or two- sided (one- or two-tailed),
depending on the research question of interest.
2. Select the appropriate test statistic. The test statistic is a single number that
summarizes the sample information. In nonparametric tests, the observed data is
converted into ranks and then the ranks are summarized into a test statistic.
3. Set up decision rule. The decision rule is a statement that tells under what
circumstances to reject the null hypothesis. Note that in some nonparametric tests
we reject H0 if the test statistic is large, while in others we reject H0 if the test
statistic is small. We make the distinction as we describe the different tests.
4. Compute the test statistic. Here we compute the test statistic by summarizing the
ranks into the test statistic identified in Step 2.
5. Conclusion. The final conclusion is made by comparing the test statistic (which is
a summary of the information observed in the sample) to the decision rule. The
final conclusion is either to reject the null hypothesis (because it is very unlikely to
observe the sample data if the null hypothesis is true) or not to reject the null
12
hypothesis (because the sample data are not very unlikely if the null hypothesis is
true).
Tests with Two Independent Samples
The modules on hypothesis testing presented techniques for testing the equality of means
in two independent sample. An underlying assumption for appropriate use of the tests
described was that the continuous outcome was approximately normally distributed or
that the samples were sufficiently large (usually n1> 30 and n2> 30) to justify their use
based on the Central Limit Theorem. When the outcome is not normally distributed and
the samples are small, a nonparametric test is appropriate.
Mann Whitney U Test (Wilcoxon Rank Sum Test)
A popular nonparametric test to compare outcomes between two independent groups is
the Mann Whitney U test. The Mann Whitney U test, sometimes called the Mann
Whitney Wilcoxon Test or the Wilcoxon Rank Sum Test, is used to test whether two
samples are likely to derive from the same population (i.e., that the two populations have
the same shape). Some investigators interpret this test as comparing the medians between
the two populations. Recall that the parametric test compares the means (H0: μ1=μ2)
between independent groups.
In contrast, the null and two-sided research hypotheses for the nonparametric test are
stated as follows:
H0: The two populations are equal versus
H1: The two populations are not equal.
This test is often performed as a two-sided test and, thus, the research hypothesis
indicates that the populations are not equal as opposed to specifying directionality. A
one-sided research hypothesis is used if interest lies in detecting a positive or negative
shift in one population as compared to the other. The procedure for the test involves
pooling the observations from the two samples into one combined sample, keeping track
of which sample each observation comes from, and then ranking lowest to highest from 1
to n1+n2, respectively
Tests with Matched Samples
This section describes nonparametric tests to compare two groups with respect to a
continuous outcome when the data are collected on matched or paired samples. The
parametric procedure for doing this was presented in the modules on hypothesis testing
for the situation in which the continuous outcome was normally distributed. This section
describes procedures that should be used when the outcome cannot be assumed to follow
a normal distribution. There are two popular nonparametric tests to compare outcomes
13
between two matched or paired groups. The first is called the Sign Test and the second
the Wilcoxon Signed Rank Test.
Recall that when data are matched or paired, we compute difference scores for each
individual and analyze difference scores. The same approach is followed in
nonparametric tests. In parametric tests, the null hypothesis is that the mean difference
(μd) is zero. In nonparametric tests, the null hypothesis is that the median difference is
zero
The Sign Test
The Sign Test is the simplest nonparametric test for matched or paired data. The
approach is to analyze only the signs of the difference scores
Test Statistic for the Sign Test
The test statistic for the Sign Test is the number of positive signs or number of negative
signs, whichever is smaller. In this example, we observe 2 negative and 6 positive signs.
Is this evidence of significant improvement or simply due to chance?
Determining whether the observed test statistic supports the null or research hypothesis is
done following the same approach used in parametric testing. Specifically, we determine
a critical value such that if the smaller of the number of positive or negative signs is less
than or equal to that critical value, then we reject H0 in favor of H1 and if the smaller of
the number of positive or negative signs is greater than the critical value, then we do not
reject H0. Notice that this is a one-sided decision rule corresponding to our one-sided
research hypothesis (the two-sided situation is discussed in the next example).
Computing P-values for the Sign Test
With the Sign test we can readily compute a p-value based on our observed test statistic.
The test statistic for the Sign Test is the smaller of the number of positive or negative
signs and it follows a binomial distribution with n = the number of subjects in the study
and p=0.5 (See the module on Probability for details on the binomial distribution). In the
example above, n=8 and p=0.5 (the probability of success under H0).
By using the binomial distribution formula:
we can compute the probability of observing different numbers of successes during 8
trials.
One-Sided versus Two-Sided Test
In the example looking for differences in repetitive behaviors in autistic children, we used
a one-sided test (i.e., we hypothesize improvement after taking the drug). A two sided test
14
can be used if we hypothesize a difference in repetitive behavior after taking the drug as
compared to before. From the table of critical values for the Sign Test, we can determine
a two-sided critical value and again reject H0 if the smaller of the number of positive or
negative signs is less than or equal to that two-sided critical value. Alternatively, we can
compute a two-sided p-value. With a two-sided test, the p-value is the probability of
observing many or few positive or negative signs. If the research hypothesis is a two
sided alternative (i.e., H1: The median difference is not zero), then the p-value is
computed as: p-value = 2*P(x < 2). Notice that this is equivalent to p-value = P(x < 2) +
P(x > 6), representing the situation of few or many successes. Recall in two-sided tests,
we reject the null hypothesis if the test statistic is extreme in either direction. Thus, in the
Sign Test, a two-sided p-value is the probability of observing few or many positive or
negative signs. Here we observe 2 negative signs (and thus 6 positive signs). The
opposite situation would be 6 negative signs (and thus 2 positive signs as n=8). The two-
sided p-value is the probability of observing a test statistic as or more extreme in either
direction (i.e.,
P(x < 2) + P(x > 6) = 0.0039 + 0.0313 + 0.1094 + 0.1094 + 0.0313 + 0.0039 = 2(0.1446)
= 0.2892).
When Difference Scores are Zero
There is a special circumstance that needs attention when implementing the Sign Test
which arises when one or more participants have difference scores of zero (i.e., their
paired measurements are identical). If there is just one difference score of zero, some
investigators drop that observation and reduce the sample size by 1 (i.e., the sample size
for the binomial distribution would be n-1). This is a reasonable approach if there is just
one zero. However, if there are two or more zeros, an alternative approach is preferred.
 If there is an even number of zeros, we randomly assign them positive or negative
signs.
 If there is an odd number of zeros, we randomly drop one and reduce the sample
size by 1, and then randomly assign the remaining observations positive or
negative signs. The following example illustrates the approach.
Wilcoxon Signed Rank Test
Another popular nonparametric test for matched or paired data is called the Wilcoxon
Signed Rank Test. Like the Sign Test, it is based on difference scores, but in addition to
analyzing the signs of the differences, it also takes into account the magnitude of the
observed differences
15
Tests with More than Two Independent Samples
In the modules on hypothesis testing we presented techniques for testing the equality of
means in more than two independent samples using analysis of variance (ANOVA). An
underlying assumption for appropriate use of ANOVA was that the continuous outcome
was approximately normally distributed or that the samples were sufficiently large
(usually nj> 30, where j=1, 2, ..., k and k denotes the number of independent comparison
groups). An additional assumption for appropriate use of ANOVA is equality of
variances in the k comparison groups. ANOVA is generally robust when the sample sizes
are small but equal. When the outcome is not normally distributed and the samples are
small, a nonparametric test is appropriate.
The Kruskal-Wallis Test
A popular nonparametric test to compare outcomes among more than two independent
groups is the Kruskal Wallis test. The Kruskal Wallis test is used to compare medians
among k comparison groups (k > 2) and is sometimes described as an ANOVA with the
data replaced by their ranks. The null and research hypotheses for the Kruskal Wallis
nonparametric test are stated as follows:
H0: The k population medians are equal versus
H1: The k population medians are not all equal
The procedure for the test involves pooling the observations from the k samples into one
combined sample, keeping track of which sample each observation comes from, and then
ranking lowest to highest from 1 to N, where N = n1+n2 + ...+ nk. To illustrate the
procedure, consider the following example.
Summary
This module presents hypothesis testing techniques for situations with small sample sizes
and outcomes that are ordinal, ranked or continuous and cannot be assumed to be
normally distributed. Nonparametric tests are based on ranks which are assigned to the
ordered data. The tests involve the same five steps as parametric tests, specifying the null
and alternative or research hypothesis, selecting and computing an appropriate test
statistic, setting up a decision rule and drawing a conclusion. The tests are summarized
below.
16
Mann Whitney U Test
Use: To compare a continuous outcome in two independent samples.
Null Hypothesis: H0: Two populations are equal
Test Statistic: The test statistic is U, the smaller of
, where R1 and R2 are the sums of the ranks in groups 1 and 2, respectively.
Decision Rule: Reject H0 if U < critical value from table
Sign Test
Use: To compare a continuous outcome in two matched or paired samples.
Null Hypothesis: H0: Median difference is zero
Test Statistic: The test statistic is the smaller of the number of positive or negative signs.
Decision Rule: Reject H0 if the smaller of the number of positive or negative signs < critical value
from table.
Wilcoxon Signed Rank Test
Use: To compare a continuous outcome in two matched or paired samples.
Null Hypothesis: H0: Median difference is zero
Test Statistic: The test statistic is W, defined as the smaller of W+ and W- which are the sums of the
positive and negative ranks of the difference scores, respectively.
Decision Rule: Reject H0 if W < critical value from table.
Kruskal Wallis Test
Use: To compare a continuous outcome in more than two independent samples.
Null Hypothesis: H0: k population medians are equal
Test Statistic: The test statistic is H,
,
where k=the number of comparison groups, N= the total sample size, nj is the sample size in the jth
group and Rj is the sum of the ranks in the jth
group.
Decision Rule: Reject H0 if H > critical value
It is important to note that nonparametric tests are subject to the same errors as
parametric tests. A Type I error occurs when a test incorrectly rejects the null hypothesis.
A Type II error occurs when a test fails to reject H0 when it is false. Power is the
probability of a test to correctly reject H0. Nonparametric tests can be subject to low
power mainly due to small sample size. Therefore, it is important to consider the
17
possibility of a Type II error when a nonparametric test fails to reject H0. There may be a
true effect or difference, yet the nonparametric test is underpowered to detect it. For more
details, interested readers should see Conover and Siegel and Castellan
4) VALIDITY AND RELIABILITY
Reliability is the degree to which an assessment tool produces stable and consistent
results.
Types of Reliability
1. Test-retest reliability is a measure of reliability obtained by administering the
same test twice over a period of time to a group of individuals. The scores from
Time 1 and Time 2 can then be correlated in order to evaluate the test for stability
over time.
Example: A test designed to assess student learning in psychology could be given to a
group of students twice, with the second administration perhaps coming a week after
the first. The obtained correlation coefficient would indicate the stability of the scores.
2. Parallel forms reliability is a measure of reliability obtained by administering
different versions of an assessment tool (both versions must contain items that
probe the same construct, skill, knowledge base, etc.) to the same group of
individuals. The scores from the two versions can then be correlated in order to
evaluate the consistency of results across alternate versions.
Example: If you wanted to evaluate the reliability of a critical thinking assessment,
you might create a large set of items that all pertain to critical thinking and then
randomly split the questions up into two sets, which would represent the parallel
forms.
3. Inter-rater reliability is a measure of reliability used to assess the degree to
which different judges or raters agree in their assessment decisions. Inter-rater
reliability is useful because human observers will not necessarily interpret answers
the same way; raters may disagree as to how well certain responses or material
demonstrate knowledge of the construct or skill being assessed.
Example: Inter-rater reliability might be employed when different judges are
evaluating the degree to which art portfolios meet certain standards. Inter-rater
reliability is especially useful when judgments can be considered relatively subjective.
Thus, the use of this type of reliability would probably be more likely when
evaluating artwork as opposed to math problems.
4. Internal consistency reliability is a measure of reliability used to evaluate the
degree to which different test items that probe the same construct produce similar
results.
A. Average inter-item correlation is a subtype of internal consistency
reliability. It is obtained by taking all of the items on a test that probe the
same construct (e.g., reading comprehension), determining the correlation
18
coefficient for each pair of items, and finally taking the average of all of
these correlation coefficients. This final step yields the average inter-item
correlation.
B. Split-half reliability is another subtype of internal consistency reliability.
The process of obtaining split-half reliability is begun by “splitting in half”
all items of a test that are intended to probe the same area of knowledge
(e.g., World War II) in order to form two “sets” of items. The entire test is
administered to a group of individuals, the total score for each “set” is
computed, and finally the split-half reliability is obtained by determining
the correlation between the two total “set” scores.
Validity refers to how well a test measures what it is purported to measure.
Why is it necessary?
While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also
needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every
day with an excess of 5lbs. The scale is reliable because it consistently reports the same
weight every day, but it is not valid because it adds 5lbs to your true weight. It is not a
valid measure of your weight.
Types of Validity
1. Face Validity ascertains that the measure appears to be assessing the intended construct
under study. The stakeholders can easily assess face validity. Although this is not a very
“scientific” type of validity, it may be an essential component in enlisting motivation of
stakeholders. If the stakeholders do not believe the measure is an accurate assessment of
the ability, they may become disengaged with the task.
Example: If a measure of art appreciation is created all of the items should be related to
the different components and types of art. If the questions are regarding historical time
periods, with no reference to any artistic movement, stakeholders may not be motivated
to give their best effort or invest in this measure because they do not believe it is a true
assessment of art appreciation.
2. Construct Validity is used to ensure that the measure is actually measure what it is
intended to measure (i.e. the construct), and not other variables. Using a panel of
“experts” familiar with the construct is a way in which this type of validity can be
assessed. The experts can examine the items and decide what that specific item is
intended to measure. Students can be involved in this process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment of learning
throughout the major. The questions are written with complicated wording and phrasing.
This can cause the test inadvertently becoming a test of reading comprehension, rather
than a test of women’s studies. It is important that the measure is actually assessing the
intended construct, rather than an extraneous factor.
19
3. Criterion-Related Validity is used to predict future or current performance - it
correlates test results with another criterion of interest.
Example: If a physics program designed a measure to assess cumulative student learning
throughout the major. The new measure could be correlated with a standardized measure
of ability in this discipline, such as an ETS field test or the GRE subject test. The higher
the correlation between the established measure and new measure, the more faith
stakeholders can have in the new assessment tool.
4. Formative Validity when applied to outcomes assessment it is used to assess how well
a measure is able to provide information to help improve the program under study.
Example: When designing a rubric for history one could assess student’s knowledge
across the discipline. If the measure can provide information that students are lacking
knowledge in a certain area, for instance the Civil Rights Movement, then that
assessment tool is providing meaningful information that can be used to improve the
course or program requirements.
5. Sampling Validity (similar to content validity) ensures that the measure covers the
broad range of areas within the concept under study. Not everything can be covered, so
items need to be sampled from all of the domains. This may need to be completed using a
panel of “experts” to ensure that the content area is adequately sampled. Additionally, a
panel can help limit “expert” bias (i.e. a test reflecting what an individual personally feels
are the most important or relevant areas).
Example: When designing an assessment of learning in the theatre department, it would
not be sufficient to only cover issues related to acting. Other areas of theatre such as
lighting, sound, functions of stage managers should all be included. The assessment
should reflect the content area in its entirety.
What are some ways to improve validity?
1. Make sure your goals and objectives are clearly defined and operationalized.
Expectations of students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally, have
the test reviewed by faculty at other schools to obtain feedback from an outside
party who is less invested in the instrument.
3. Get students involved; have the students look over the assessment for troublesome
wording, or other difficulties.
4. If possible, compare your measure with other measures, or data that may be
available.
20
5) APPROACHES TO QUALITATIVE AND QUANTITATIVE DATA
ANALYSIS
Qualitative analysis: Richness and Precision.
The aim of qualitative analysis is a complete, detailed description. No attempt is made to
assign frequencies to the linguistic features which are identified in the data, and rare
phenomena receives (or should receive) the same amount of attention as more frequent
phenomena. Qualitative analysis allows for fine distinctions to be drawn because it is not
necessary to shoehorn the data into a finite number of classifications. Ambiguities, which
are inherent in human language, can be recognised in the analysis. For example, the word
"red" could be used in a corpus to signify the colour red, or as a political cateogorisation
(e.g. socialism or communism). In a qualitative analysis both senses of red in the phrase
"the red flag" could be recognised.
The main disadvantage of qualitative approaches to corpus analysis is that their findings
can not be extended to wider populations with the same degree of certainty that
quantitative analyses can. This is because the findings of the research are not tested to
discover whether they are statistically significant or due to chance.
Quantitative analysis: Statistically reliable and generalisable results.
In quantitative research we classify features, count them, and even construct more
complex statistical models in an attempt to explain what is observed. Findings can be
generalised to a larger population, and direct comparisons can be made between two
corpora, so long as valid sampling and significance techniques have been used. Thus,
quantitative analysis allows us to discover which phenomena are likely to be genuine
reflections of the behaviour of a language or variety, and which are merely chance
occurences. The more basic task of just looking at a single language variety allows one to
get a precise picture of the frequency and rarity of particular phenomena, and thus their
relative normality or abnomrality.
However, the picture of the data which emerges from quantitative analysis is less rich
than that obtained from qualitative analysis. For statistical purposes, classifications have
to be of the hard-and-fast (so-called "Aristotelian" type). An item either belongs to class x
or it doesn't. So in the above example about the phrase "the red flag" we would have to
decide whether to classify "red" as "politics" or "colour". As can be seen, many linguistic
terms and phenomena do not therefore belong to simple, single categories: rather they are
more consistent with the recent notion of "fuzzy sets" as in the red example. Quantatitive
analysis is therefore an idealisation of the data in some cases. Also, quantatitve analysis
tends to sideline rare occurences. To ensure that certain statistical tests (such as chi-
squared) provide reliable results, it is essential that minimum frequencies are obtained -
meaning that categories may have to be collapsed into one another resulting in a loss of
data richness
Quantitative research focuses on numbers or quantities. Quantitative studies have results
that are based on numeric analysis and statistics. Often, these studies have many
participants. It is not unusual for there to be over a thousand people in a quantitative
21
research study. It is ideal to have a large number of participants because this gives
analysis more statistical power.
Qualitative research studies are focused on differences in quality, rather than differences
in quantity. Results are in words or pictures rather than numbers. Qualitative studies
usually have fewer participants than quantitative studies because the depth of the data
collection does not allow for large numbers of participants.
Quantitative and qualitative studies both have strengths and weaknesses. A particular
strength of quantitative research is that statistical analysis allows for generalization (to
some extent) to others. A goal of quantitative research is to choose a sample that closely
resembles the population. Qualitative research does not seek to choose samples that are
representative of populations.
However, qualitative data does provide a depth and richness of data not possible with
quantitative data. Although there are fewer participants, the researchers generally know
more details about each participant. Quantitative researchers collect data on more
participants, so it is not possible to have the depth and breadth of knowledge about each.
Quantitative analysis allows researchers to test specific hypotheses. Depending on
research findings, hypotheses are either supported or not supported. Qualitative analysis
is usually for more exploratory purposes. Researchers are typically open to allowing the
data to take them in different directions. Because qualitative research is more open to
different interpretations, qualitative researchers may be more prone to accusations of bias
and personal subjectivity.
An example of qualitative research: Joe wants to study the coming out processes of gays
and lesbians in rural settings. He doesn't feel that the process can be well-represented by
having participants fill out questionnaires with closed-ended (multiple choice) questions.
He knows it's a complex process, and he'd like to get information from not only gays and
lesbians but from their families and friends. He doesn't have the time or money to explore
the lives of hundreds of participants, so he chooses five gays and lesbians who he thinks
have interesting stories. He conducts a series of interviews with each participant. He then
asks them all to identify three family members or friends, and Joe interviews them as
well.
An example of quantitative: Stephanie is interested in the types of birth control that
college students use most frequently at her university. She sends an email-based survey to
a randomly selected group of 500 students. About 400 respond to the survey. They go to
a website to fill out the survey, which takes about 5-10 minutes. The data is compiled in a
database. Stephanie runs statistical analysis to determine the most popular types of birth
control.

Recomendados

Research methodology module-2 por
Research methodology module-2Research methodology module-2
Research methodology module-2Satyajit Behera
847 visualizações71 slides
Research methodology module-1 por
Research methodology module-1Research methodology module-1
Research methodology module-1Satyajit Behera
2.1K visualizações34 slides
Research methods and paradigms por
Research methods and paradigmsResearch methods and paradigms
Research methods and paradigmsChinly Ruth Alberto
5.6K visualizações70 slides
Research Design por
Research DesignResearch Design
Research DesignAngel Rose
12.4K visualizações59 slides
Educational research por
Educational researchEducational research
Educational researchMukut Deori
23.1K visualizações18 slides
Hypothesis and its types por
Hypothesis and its typesHypothesis and its types
Hypothesis and its typesrajukammari
330.8K visualizações19 slides

Mais conteúdo relacionado

Mais procurados

Hypothesis por
HypothesisHypothesis
HypothesisGizachew Asrat
4.9K visualizações20 slides
Chapter 5 (hypothesis formulation) por
Chapter 5 (hypothesis formulation)Chapter 5 (hypothesis formulation)
Chapter 5 (hypothesis formulation)BoreyThai1
1.6K visualizações18 slides
3. Research Methodologies por
3. Research Methodologies3. Research Methodologies
3. Research MethodologiesCentro Escolar University
3.1K visualizações19 slides
Resrach hypothesis por
Resrach hypothesisResrach hypothesis
Resrach hypothesisTarek Tawfik Amin
1.2K visualizações36 slides
HYPOTHESIS por
HYPOTHESISHYPOTHESIS
HYPOTHESISVanarajVasanthiRK
1.2K visualizações22 slides
05 chap 4 research methodology and design por
05 chap 4 research methodology and design05 chap 4 research methodology and design
05 chap 4 research methodology and designELIMENG
1.9K visualizações44 slides

Mais procurados(20)

Hypothesis por Gizachew Asrat
HypothesisHypothesis
Hypothesis
Gizachew Asrat4.9K visualizações
Chapter 5 (hypothesis formulation) por BoreyThai1
Chapter 5 (hypothesis formulation)Chapter 5 (hypothesis formulation)
Chapter 5 (hypothesis formulation)
BoreyThai11.6K visualizações
Resrach hypothesis por Tarek Tawfik Amin
Resrach hypothesisResrach hypothesis
Resrach hypothesis
Tarek Tawfik Amin1.2K visualizações
HYPOTHESIS por VanarajVasanthiRK
HYPOTHESISHYPOTHESIS
HYPOTHESIS
VanarajVasanthiRK1.2K visualizações
05 chap 4 research methodology and design por ELIMENG
05 chap 4 research methodology and design05 chap 4 research methodology and design
05 chap 4 research methodology and design
ELIMENG1.9K visualizações
Research and types of research por Ali Karim
Research and types of researchResearch and types of research
Research and types of research
Ali Karim3.6K visualizações
Psychological Research por King Abidi
Psychological ResearchPsychological Research
Psychological Research
King Abidi667 visualizações
Research and Theory por Sundar B N
Research and TheoryResearch and Theory
Research and Theory
Sundar B N2.1K visualizações
Net coaching &amp; remedial . paper 1 research por Bhumi Dangi
Net coaching &amp; remedial . paper 1 researchNet coaching &amp; remedial . paper 1 research
Net coaching &amp; remedial . paper 1 research
Bhumi Dangi436 visualizações
Practical research 2 por Grisel Salvia
Practical research 2Practical research 2
Practical research 2
Grisel Salvia1.5K visualizações
Net coaching &amp; remedial classes p 1 part 2 research por Bhumi Dangi
Net coaching &amp; remedial classes p 1 part 2 researchNet coaching &amp; remedial classes p 1 part 2 research
Net coaching &amp; remedial classes p 1 part 2 research
Bhumi Dangi508 visualizações
Research methodology por ANCYBS
Research methodologyResearch methodology
Research methodology
ANCYBS234 visualizações
Chapter 4-RESEARCH HYPOTHESIS AND DEFINING VARIABLES por Ludy Mae Nalzaro,BSM,BSN,MN
Chapter 4-RESEARCH HYPOTHESIS AND DEFINING VARIABLESChapter 4-RESEARCH HYPOTHESIS AND DEFINING VARIABLES
Chapter 4-RESEARCH HYPOTHESIS AND DEFINING VARIABLES
Ludy Mae Nalzaro,BSM,BSN,MN127.2K visualizações
Research Related Terms por guest349908
Research Related TermsResearch Related Terms
Research Related Terms
guest34990822.3K visualizações
Research Introduction , Meaning, Objectives, Motives and Types por RajaKrishnan M
Research Introduction , Meaning, Objectives, Motives and TypesResearch Introduction , Meaning, Objectives, Motives and Types
Research Introduction , Meaning, Objectives, Motives and Types
RajaKrishnan M23.2K visualizações
Introductory Psychology: Research Design por Brian Piper
Introductory Psychology: Research DesignIntroductory Psychology: Research Design
Introductory Psychology: Research Design
Brian Piper16K visualizações
RESEARCH HYPOTHESIS por MAHESWARI JAIKUMAR
RESEARCH HYPOTHESISRESEARCH HYPOTHESIS
RESEARCH HYPOTHESIS
MAHESWARI JAIKUMAR 65.9K visualizações

Similar a Research Methodology Module-05

Maths probability por
Maths probabilityMaths probability
Maths probabilitySaurabh Sonwalkar
2.9K visualizações15 slides
Mathematical Reasoning (unit-5) UGC NET Paper-1 Study Notes (E-books) Down... por
Mathematical  Reasoning (unit-5) UGC NET Paper-1  Study Notes (E-books)  Down...Mathematical  Reasoning (unit-5) UGC NET Paper-1  Study Notes (E-books)  Down...
Mathematical Reasoning (unit-5) UGC NET Paper-1 Study Notes (E-books) Down...DIwakar Rajput
3.1K visualizações96 slides
Probability In Daily Life por
Probability In Daily LifeProbability In Daily Life
Probability In Daily LifeFinished Custom Writing Paper Newberry College
5 visualizações24 slides
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx por
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxkarlhennesey
5 visualizações29 slides
Probability And Statistics por
Probability And StatisticsProbability And Statistics
Probability And StatisticsProfessional College Paper Writers Singapore
6 visualizações24 slides
Inductive Essay Examples por
Inductive Essay ExamplesInductive Essay Examples
Inductive Essay ExamplesCollege Paper Writing Service Reviews
4 visualizações9 slides

Similar a Research Methodology Module-05(20)

Maths probability por Saurabh Sonwalkar
Maths probabilityMaths probability
Maths probability
Saurabh Sonwalkar2.9K visualizações
Mathematical Reasoning (unit-5) UGC NET Paper-1 Study Notes (E-books) Down... por DIwakar Rajput
Mathematical  Reasoning (unit-5) UGC NET Paper-1  Study Notes (E-books)  Down...Mathematical  Reasoning (unit-5) UGC NET Paper-1  Study Notes (E-books)  Down...
Mathematical Reasoning (unit-5) UGC NET Paper-1 Study Notes (E-books) Down...
DIwakar Rajput3.1K visualizações
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx por karlhennesey
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
karlhennesey5 visualizações
Statistics basics por Sadrani Yash
Statistics basicsStatistics basics
Statistics basics
Sadrani Yash1.3K visualizações
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx por evonnehoggarth79783
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
evonnehoggarth797833 visualizações
Water Hypothesis Lab Report por Christina Santos
Water Hypothesis Lab ReportWater Hypothesis Lab Report
Water Hypothesis Lab Report
Christina Santos4 visualizações
Chapter 1 Ap Psych- Research Methods por Dr. J's AP Psych Class
Chapter 1 Ap Psych- Research MethodsChapter 1 Ap Psych- Research Methods
Chapter 1 Ap Psych- Research Methods
Dr. J's AP Psych Class5.6K visualizações
Es estadísticas duro por leonardo19940511
Es estadísticas duroEs estadísticas duro
Es estadísticas duro
leonardo19940511336 visualizações
D. Mayo: Replication Research Under an Error Statistical Philosophy por jemille6
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
jemille69.4K visualizações
Hypothesis por Ani
HypothesisHypothesis
Hypothesis
Ani 291 visualizações
Barra Presentation por spgreiner
Barra PresentationBarra Presentation
Barra Presentation
spgreiner361 visualizações
Hypothesis testing por praveen3030
Hypothesis testingHypothesis testing
Hypothesis testing
praveen3030284 visualizações
PSY 341 Judgement, Decisions, Reasoning Notes Abyana por NurulAbyana
PSY 341 Judgement, Decisions, Reasoning Notes AbyanaPSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes Abyana
NurulAbyana113 visualizações
How Dissonance Can Be Reduced Essay por Elizabeth Snyder
How Dissonance Can Be Reduced EssayHow Dissonance Can Be Reduced Essay
How Dissonance Can Be Reduced Essay
Elizabeth Snyder5 visualizações
Hypothesis por DHUTNYRTUNG
HypothesisHypothesis
Hypothesis
DHUTNYRTUNG1.3K visualizações
The Theory Of Psychology And Psychology por Alyssa Dennis
The Theory Of Psychology And PsychologyThe Theory Of Psychology And Psychology
The Theory Of Psychology And Psychology
Alyssa Dennis3 visualizações
Mixed Method Research Manuscript por Sarah Gordon
Mixed Method Research ManuscriptMixed Method Research Manuscript
Mixed Method Research Manuscript
Sarah Gordon4 visualizações

Mais de Kishor Ade

Motivation por
MotivationMotivation
MotivationKishor Ade
589 visualizações8 slides
Contouring por
ContouringContouring
ContouringKishor Ade
3.9K visualizações47 slides
Balbharati aani kumarbhartitalya marathi kavita por
Balbharati aani kumarbhartitalya marathi kavitaBalbharati aani kumarbhartitalya marathi kavita
Balbharati aani kumarbhartitalya marathi kavitaKishor Ade
12.4K visualizações94 slides
THEORY OF STRUCTURES-I [B. ARCH.] por
THEORY OF STRUCTURES-I [B. ARCH.]THEORY OF STRUCTURES-I [B. ARCH.]
THEORY OF STRUCTURES-I [B. ARCH.]Kishor Ade
7.7K visualizações36 slides
Farmwork shuttering por
Farmwork shutteringFarmwork shuttering
Farmwork shutteringKishor Ade
2.2K visualizações45 slides
First lecture on First year at Trinity Architecture college Pune por
First lecture on First year at Trinity Architecture college PuneFirst lecture on First year at Trinity Architecture college Pune
First lecture on First year at Trinity Architecture college PuneKishor Ade
233 visualizações59 slides

Mais de Kishor Ade(20)

Motivation por Kishor Ade
MotivationMotivation
Motivation
Kishor Ade589 visualizações
Contouring por Kishor Ade
ContouringContouring
Contouring
Kishor Ade3.9K visualizações
Balbharati aani kumarbhartitalya marathi kavita por Kishor Ade
Balbharati aani kumarbhartitalya marathi kavitaBalbharati aani kumarbhartitalya marathi kavita
Balbharati aani kumarbhartitalya marathi kavita
Kishor Ade12.4K visualizações
THEORY OF STRUCTURES-I [B. ARCH.] por Kishor Ade
THEORY OF STRUCTURES-I [B. ARCH.]THEORY OF STRUCTURES-I [B. ARCH.]
THEORY OF STRUCTURES-I [B. ARCH.]
Kishor Ade7.7K visualizações
Farmwork shuttering por Kishor Ade
Farmwork shutteringFarmwork shuttering
Farmwork shuttering
Kishor Ade2.2K visualizações
First lecture on First year at Trinity Architecture college Pune por Kishor Ade
First lecture on First year at Trinity Architecture college PuneFirst lecture on First year at Trinity Architecture college Pune
First lecture on First year at Trinity Architecture college Pune
Kishor Ade233 visualizações
Demolition work of existing building por Kishor Ade
Demolition work of existing buildingDemolition work of existing building
Demolition work of existing building
Kishor Ade2.7K visualizações
Levelling por Kishor Ade
LevellingLevelling
Levelling
Kishor Ade400 visualizações
Specification writing-I por Kishor Ade
Specification writing-ISpecification writing-I
Specification writing-I
Kishor Ade15.1K visualizações
Surveying and levelling por Kishor Ade
Surveying and levellingSurveying and levelling
Surveying and levelling
Kishor Ade906 visualizações
Plane Table Survey por Kishor Ade
Plane Table SurveyPlane Table Survey
Plane Table Survey
Kishor Ade614 visualizações
Compass surveying por Kishor Ade
Compass surveying Compass surveying
Compass surveying
Kishor Ade398 visualizações
Natural Beauty of India...............! por Kishor Ade
Natural Beauty of India...............!Natural Beauty of India...............!
Natural Beauty of India...............!
Kishor Ade1.2K visualizações
Life will be never ending..........! por Kishor Ade
Life will be never ending..........!Life will be never ending..........!
Life will be never ending..........!
Kishor Ade261 visualizações
Structures-V por Kishor Ade
Structures-VStructures-V
Structures-V
Kishor Ade113 visualizações
Specification Writing-I por Kishor Ade
Specification Writing-I Specification Writing-I
Specification Writing-I
Kishor Ade409 visualizações
Research Methodology Module-08 por Kishor Ade
Research Methodology Module-08Research Methodology Module-08
Research Methodology Module-08
Kishor Ade1.7K visualizações
Research Methodology Module-07 por Kishor Ade
Research Methodology Module-07Research Methodology Module-07
Research Methodology Module-07
Kishor Ade1.7K visualizações
Research Methodology Module-06 por Kishor Ade
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
Kishor Ade1.8K visualizações
Research Methodology Module-04 por Kishor Ade
Research Methodology Module-04Research Methodology Module-04
Research Methodology Module-04
Kishor Ade1.1K visualizações

Último

_MAKRIADI-FOTEINI_diploma thesis.pptx por
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptxfotinimakriadi
8 visualizações32 slides
MongoDB.pdf por
MongoDB.pdfMongoDB.pdf
MongoDB.pdfArthyR3
45 visualizações6 slides
Design of machine elements-UNIT 3.pptx por
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptxgopinathcreddy
33 visualizações31 slides
START Newsletter 3 por
START Newsletter 3START Newsletter 3
START Newsletter 3Start Project
6 visualizações25 slides
GDSC Mikroskil Members Onboarding 2023.pdf por
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdfgdscmikroskil
58 visualizações62 slides
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... por
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...csegroupvn
5 visualizações210 slides

Último(20)

_MAKRIADI-FOTEINI_diploma thesis.pptx por fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 visualizações
MongoDB.pdf por ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR345 visualizações
Design of machine elements-UNIT 3.pptx por gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy33 visualizações
START Newsletter 3 por Start Project
START Newsletter 3START Newsletter 3
START Newsletter 3
Start Project6 visualizações
GDSC Mikroskil Members Onboarding 2023.pdf por gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil58 visualizações
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... por csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn5 visualizações
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth por Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 6 visualizações
Ansari: Practical experiences with an LLM-based Islamic Assistant por M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous5 visualizações
Pitchbook Repowerlab.pdf por VictoriaGaleano
Pitchbook Repowerlab.pdfPitchbook Repowerlab.pdf
Pitchbook Repowerlab.pdf
VictoriaGaleano5 visualizações
Design_Discover_Develop_Campaign.pptx por ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth637 visualizações
Proposal Presentation.pptx por keytonallamon
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptx
keytonallamon52 visualizações
sam_software_eng_cv.pdf por sammyigbinovia
sam_software_eng_cv.pdfsam_software_eng_cv.pdf
sam_software_eng_cv.pdf
sammyigbinovia8 visualizações
Web Dev Session 1.pptx por VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande11 visualizações
SPICE PARK DEC2023 (6,625 SPICE Models) por Tsuyoshi Horigome
SPICE PARK DEC2023 (6,625 SPICE Models) SPICE PARK DEC2023 (6,625 SPICE Models)
SPICE PARK DEC2023 (6,625 SPICE Models)
Tsuyoshi Horigome33 visualizações
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx por lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78109 visualizações
DESIGN OF SPRINGS-UNIT4.pptx por gopinathcreddy
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptx
gopinathcreddy19 visualizações
SUMIT SQL PROJECT SUPERSTORE 1.pptx por Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 18 visualizações
Searching in Data Structure por raghavbirla63
Searching in Data StructureSearching in Data Structure
Searching in Data Structure
raghavbirla6314 visualizações

Research Methodology Module-05

  • 1. 1 Module 5 RM Preliminary data analysis 1) TESTING OF HYPOTHESIS CONCEPTS AND TESTING HYPOTHESIS TESTING DEFINITION Hypothesis tests are procedures for making rational decisions about the reality of effects. Rational Decisions Most decisions require that an individual select a single alternative from a number of possible alternatives. The decision is made without knowing whether or not it is correct; that is, it is based on incomplete information. For example, a person either takes or does not take an umbrella to school based upon both the weather report and observation of outside conditions. If it is not currently raining, this decision must be made with incomplete information. A rational decision is characterized by the use of a procedure which insures the likelihood or probability that success is incorporated into the decision-making process. The procedure must be stated in such a fashion that another individual, using the same information, would make the same decision. One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is stranded on a planet without his communicator and is unable to get back to the Enterprise. Spock has assumed command and is being attacked by Klingons (who else). Spock asks for and receives information about the location of the enemy, but is unable to act because he does not have complete information. Captain Kirk arrives at the last moment and saves the day because he can act on incomplete information. This story goes against the concept of rational man. Spock, being the ultimate rational man, would not be immobilized by indecision. Instead, he would have selected the alternative which realized the greatest expected benefit given the information available. If complete information were required to make decisions, few decisions would be made by rational men and women. This is obviously not the case. The script writer misunderstood Spock and rational man. Effects When a change in one thing is associated with a change in another, we have an effect. The changes may be either quantitative or qualitative, with the hypothesis testing procedure selected based upon the type of change observed. For example, if changes in salt intake in a diet are associated with activity level in children, we say an effect occurred. In another case, if the distribution of political party preference (Republicans, Democrats, or Independents) differs for sex (Male or Female), then an effect is present. Much of the behavioral science is directed toward discovering and understanding effects. The effects discussed in the remainder of this text appear as various statistics including: differences between means, contingency tables, and correlation coefficients.
  • 2. 2 GENERAL PRINCIPLES All hypothesis tests conform to similar principles and proceed with the same sequence of events.  A model of the world is created in which there are no effects. The experiment is then repeated an infinite number of times.  The results of the experiment are compared with the model of step one. If, given the model, the results are unlikely, then the model is rejected and the effects are accepted as real. If, the results could be explained by the model, the model must be retained. In the latter case no decision can be made about the reality of effects. Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true. An analogous situation exists with respect to hypothesis testing in statistics. In hypothesis testing one wishes to show real effects of an experiment. By showing that the experimental results were unlikely, given that there were no effects, one may decide that the effects are, in fact, real. The hypothesis that there were no effects is called the NULL HYPOTHESIS. The symbol H0 is used to abbreviate the Null Hypothesis in statistics. Note that, unlike geometry, we cannot prove the effects are real, rather we may decide the effects are real. For example, suppose the following probability model (distribution) described the state of the world. In this case the decision would be that there were no effects; the null hypothesis is true. Event A might be considered fairly likely, given the above model was correct. As a result the model would be retained, along with the NULL HYPOTHESIS. Event B on the other hand is unlikely, given the model. Here the model would be rejected, along with the NULL HYPOTHESIS. The Model The SAMPLING DISTRIBUTION is a distribution of a sample statistic. It is used as a model of what would happen if 1.) The null hypothesis were true (there really were no effects), and 2.) The experiment was repeated an infinite number of times. Because of its importance in hypothesis testing, the sampling distribution will be discussed in a separate chapter.
  • 3. 3 Probability Probability is a theory of uncertainty. It is a necessary concept because the world according to the scientist is unknowable in its entirety. However, prediction and decisions are obviously possible. As such, probability theory is a rational means of dealing with an uncertain world. Probabilities are numbers associated with events that range from zero to one (0-1). A probability of zero means that the event is impossible. For example, if I were to flip a coin, the probability of a leg is zero, due to the fact that a coin may have a head or tail, but not a leg. Given a probability of one, however, the event is certain. For example, if I flip a coin the probability of heads, tails, or an edge is one, because the coin must take one of these possibilities. In real life, most events have probabilities between these two extremes. For instance, the probability of rain tonight is .40; tomorrow night the probability is .10. Thus it can be said that rain is more likely tonight than tomorrow. The meaning of the term probability depends upon one's philosophical orientation. In the CLASSICAL approach, probabilities refer to the relative frequency of an event, given the experiment was repeated an infinite number of times. For example, the .40 probability of rain tonight means that if the exact conditions of this evening were repeated an infinite number of times, it would rain 40% of the time. In the Subjective approach, however, the term probability refers to a "degree of belief." That is, the individual assigning the number .40 to the probability of rain tonight believes that, on a scale from 0 to 1, the likelihood of rain is .40. This leads to a branch of statistics called "BAYESIAN STATISTICS." While many statisticians take this approach, it is not usually taught at the introductory level. At this point in time all the introductory student needs to know is that a person calling themselves a "Bayesian Statistician" is not ignorant of statistics. Most likely, he or she is simply involved in the theory of statistics. No matter what theoretical position is taken, all probabilities must conform to certain rules. Some of the rules are concerned with how probabilities combine with one another to form new probabilities. For example, when events are independent, that is, one doesn't effect the other, the probabilities may be multiplied together to find the probability of the joint event. The probability of rain today AND the probability of getting a head when flipping a coin is the product of the two individual probabilities. A deck of cards illustrates other principles of probability theory. In bridge, poker, rummy, etc., the probability of a heart can be found by dividing thirteen, the number of hearts, by fifty-two, the number of cards, assuming each card is equally likely to be drawn. The probability of a queen is four (the number of queens) divided by the number of cards. The probability of a queen OR a heart is sixteen divided by fifty-two. This figure is computed by adding the probability of hearts to the probability of a queen, and then subtracting the probability of a queen AND a heart which equals 1/52. An introductory mathematical probability and statistics course usually begins with the principles of probability and proceeds to the applications of these principles. One
  • 4. 4 problem a student might encounter concerns unsorted socks in a sock drawer. Suppose one has twenty-five pairs of unsorted socks in a sock drawer. What is the probability of drawing out two socks at random and getting a pair? What is the probability of getting a match to one of the first two when drawing out a third sock? How many socks on the average would need to be drawn before one could expect to find a pair? This problem is rather difficult and will not be solved here, but is used to illustrate the type of problem found in mathematical statistics. Hypothesis Testing Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps. 1. Formulate the null hypothesis (commonly, that the observations are the result of pure chance) and the alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation). 2. Identify a test statistic that can be used to assess the truth of the null hypothesis. 3. Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the -value, the stronger the evidence against the null hypothesis. 4. Compare the -value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid. 2) ANALYSIS OF VARIANCE TECHNIQUES An important technique for analyzing the effect of categorical factors on a response is to perform an Analysis of Variance. An ANOVA decomposes the variability in the response variable amongst the different factors. Depending upon the type of analysis, it may be important to determine: (a) which factors have a significant effect on the response, and/or (b) how much of the variability in the response variable is attributable to each factor. STATGRAPHICS Centurion provides several procedures for performing an analysis of variance: 1. One-Way ANOVA - used when there is only a single categorical factor. This is equivalent to comparing multiple groups of data. 2. Multifactor ANOVA - used when there is more than one categorical factor, arranged in a crossed pattern. When factors are crossed, the levels of one factor appear at more than one level of the other factors. 3. Variance Components Analysis - used when there are multiple factors, arranged in a hierarchical manner. In such a design, each factor is nested in the factor above it. 4. General Linear Models - used whenever there are both crossed and nested factors, when some factors are fixed and some are random, and when both categorical and quantitative factors are present.
  • 5. 5 One-Way ANOVA A one-way analysis of variance is used when the data are divided into groups according to only one factor. The questions of interest are usually: (a) Is there a significant difference between the groups?, and (b) If so, which groups are significantly different from which others? Statistical tests are provided to compare group means, group medians, and group standard deviations. When comparing means, multiple range tests are used, the most popular of which is Tukey's HSD procedure. For equal size samples, significant group differences can be determined by examining the means plot and identifying those intervals that do not overlap. Multifactor ANOVA When more than one factor is present and the factors are crossed, a multifactor ANOVA is appropriate. Both main effects and interactions between the factors may be estimated. The output includes an ANOVA table and a new graphical ANOVA from the latest edition of Statistics for Experimenters by Box, Hunter and Hunter (Wiley, 2005). In a graphical ANOVA, the points are scaled so that any levels that differ by more than exhibited in the distribution of the residuals are significantly different.
  • 6. 6 Variance Components Analysis A Variance Components Analysis is most commonly used to determine the level at which variability is being introduced into a product. A typical experiment might select several batches, several samples from each batch, and then run replicates tests on each sample. The goal is to determine the relative percentages of the overall process variability that is being introduced at each level.
  • 7. 7 General Linear Model The General Linear Models procedure is used whenever the above procedures are not appropriate. It can be used for models with both crossed and nested factors, models in which one or more of the variables is random rather than fixed, and when quantitative factors are to be combined with categorical ones. Designs that can be analyzed with the GLM procedure include partially nested designs, repeated measures experiments, split plots, and many others. For example, pages 536-540 of the book Design and Analysis of Experiments (sixth edition) by Douglas Montgomery (Wiley, 2005) contains an example of an experimental design with both crossed and nested factors. For that data, the GLM procedure produces several important tables, including estimates of the variance components for the random factors. Analysis of Variance (ANOVA) Purpose The reason for doing an ANOVA is to see if there is any difference between groups on some variable. For example, you might have data on student performance in non-assessed tutorial exercises as well as their final grading. You are interested in seeing if tutorial performance is related to final grade. ANOVA allows you to break up the group according to the grade and then see if performance is different across these grades. ANOVA is available for both parametric (score data) and non-parametric (ranking/ordering) data. Types of ANOVA One-way between groups The example given above is called a one-way between groups model. You are looking at the differences between the groups. There is only one grouping (final grade) which you are using to define the groups. This is the simplest version of ANOVA. This type of ANOVA can also be used to compare variables between different groups - tutorial performance from different intakes. One-way repeated measures A one way repeated measures ANOVA is used when you have a single group on which you have measured something a few times. For example, you may have a test of understanding of Classes. You give this test at the beginning of the topic, at the end of the topic and then at the end of the subject. You would use a one-way repeated measures ANOVA to see if student performance on the test changed over time. Two-way between groups A two-way between groups ANOVA is used to look at complex groupings.
  • 8. 8 For example, the grades by tutorial analysis could be extended to see if overseas students performed differently to local students. What you would have from this form of ANOVA is: The effect of final grade The effect of overseas versus local The interaction between final grade and overseas/local Each of the main effects are one-way tests. The interaction effect is simply asking "is there any significant difference in performance when you take final grade and overseas/local acting together". Two-way repeated measures This version of ANOVA simple uses the repeated measures structure and includes an interaction effect. In the example given for one-way between groups, you could add Gender and see if there was any joint effect of gender and time of testing - i.e. do males and females differ in the amount they remember/absorb over time. Non-parametric and Parametric ANOVA is available for score or interval data as parametric ANOVA. This is the type of ANOVA you do from the standard menu options in a statistical package. The non-parametric version is usually found under the heading "Nonparametric test". It is used when you have rank or ordered data. You cannot use parametric ANOVA when you data is below interval measurement. Where you have categorical data you do not have an ANOVA method - you would have to use Chi-square which is about interaction rather than about differences between groups. How it’s done What ANOVA looks at is the way groups differ internally versus what the difference is between them. To take the above example: 1. ANOVA calculates the mean for each of the final grading groups (HD, D, Cr, P, N) on the tutorial exercise figure - the Group Means. 2. It calculates the mean for all the groups combined - the Overall Mean. 3. Then it calculates, within each group, the total deviation of each individual's score from the Group Mean - Within Group Variation. 4. Next, it calculates the deviation of each Group Mean from the Overall Mean - Between Group Variation. 5. Finally, ANOVA produces the F statistic which is the ratio Between Group Variation to the Within Group Variation. If the Between Group Variation is significantly greater than the Within Group Variation, then it is likely that there is a statistically significant difference between the groups. The statistical package will tell you if the F ratio is significant or not. All versions of ANOVA follow these basic principles but the sources of Variation get more complex as the number of groups and the interaction effects increase.
  • 9. 9 3) INTRODUCTION TO NON PARAMETRIC TESTS Introduction to Nonparametric Testing This module will describe some popular nonparametric tests for continuous outcomes. Interested readers should see Conover3 for a more comprehensive coverage of nonparametric tests. Key Concept: Parametric tests are generally more powerful and can test a wider range of alternative hypotheses. It is worth repeating that if data are approximately normally distributed then parametric tests (as in the modules on hypothesis testing) are more appropriate. However, there are situations in which assumptions for a parametric test are violated and a nonparametric test is more appropriate. The techniques described here apply to outcomes that are ordinal, ranked, or continuous outcome variables that are not normally distributed. Recall that continuous outcomes are quantitative measures based on a specific measurement scale (e.g., weight in pounds, height in inches). Some investigators make the distinction between continuous, interval and ordinal scaled data. Interval data are like continuous data in that they are measured on a constant scale (i.e., there exists the same difference between adjacent scale scores across the entire spectrum of scores). Differences between interval scores are interpretable, but ratios are not. Temperature in Celsius or Fahrenheit is an example of an interval scale outcome. The difference between 30º and 40º is the same as the difference between 70º and 80º, yet 80º is not twice as warm as 40º. Ordinal outcomes can be less specific as the ordered categories need not be equally spaced. Symptom severity is an example of an ordinal outcome and it is not clear whether the difference between much worse and slightly worse is the same as the difference between no change and slightly improved. Some studies use visual scales to assess participants' self-reported signs and symptoms. Pain is often measured in this way, from 0 to 10 with 0 representing no pain and 10 representing agonizing pain. Participants are sometimes shown a visual scale such as that shown in the upper portion of the figure below and asked to choose the number
  • 10. 10 that best represents their pain state. Sometimes pain scales use visual anchors as shown in the lower portion of the figure below. Visual Pain Scale In the upper portion of the figure, certainly 10 is worse than 9, which is worse than 8; however, the difference between adjacent scores may not necessarily be the same. It is important to understand how outcomes are measured to make appropriate inferences based on statistical analysis and, in particular, not to overstate precision. Assigning Ranks The nonparametric procedures that we describe here follow the same general procedure. The outcome variable (ordinal, interval or continuous) is ranked from lowest to highest and the analysis focuses on the ranks as opposed to the measured or raw values. For example, suppose we measure self-reported pain using a visual analog scale with anchors at 0 (no pain) and 10 (agonizing pain) and record the following in a sample of n=6 participants: 7 5 9 3 0 2 The ranks, which are used to perform a nonparametric test, are assigned as follows: First, the data are ordered from smallest to largest. The lowest value is then assigned a rank of 1, the next lowest a rank of 2 and so on. The largest value is assigned a rank of n (in this example, n=6). The observed data and corresponding ranks are shown below: Ordered Observed Data: 0 2 3 5 7 9 Ranks: 1 2 3 4 5 6 A complicating issue that arises when assigning ranks occurs when there are ties in the sample (i.e., the same values are measured in two or more participants). For example, suppose that the following data are observed in our sample of n=6:
  • 11. 11 Observed Data: 7 7 9 3 0 2 The 4th and 5th ordered values are both equal to 7. When assigning ranks, the recommended procedure is to assign the mean rank of 4.5 to each (i.e. the mean of 4 and 5), as follows: Ordered Observed Data: 0.52.53.5 7 7 9 Ranks: 1.52.53.54.54.56 Suppose that there are three values of 7. In this case, we assign a rank of 5 (the mean of 4, 5 and 6) to the 4th , 5th and 6th values, as follows: Ordered Observed Data: 0 2 3 7 7 7 Ranks: 1 2 3 5 5 5 Using this approach of assigning the mean rank when there are ties ensures that the sum of the ranks is the same in each sample (for example, 1+2+3+4+5+6=21, 1+2+3+4.5+4.5+6=21 and 1+2+3+5+5+5=21). Using this approach, the sum of the ranks will always equal n(n+1)/2. When conducting nonparametric tests, it is useful to check the sum of the ranks before proceeding with the analysis. To conduct nonparametric tests, we again follow the five-step approach outlined in the modules on hypothesis testing. 1. Set up hypotheses and select the level of significance α. Analogous to parametric testing, the research hypothesis can be one- or two- sided (one- or two-tailed), depending on the research question of interest. 2. Select the appropriate test statistic. The test statistic is a single number that summarizes the sample information. In nonparametric tests, the observed data is converted into ranks and then the ranks are summarized into a test statistic. 3. Set up decision rule. The decision rule is a statement that tells under what circumstances to reject the null hypothesis. Note that in some nonparametric tests we reject H0 if the test statistic is large, while in others we reject H0 if the test statistic is small. We make the distinction as we describe the different tests. 4. Compute the test statistic. Here we compute the test statistic by summarizing the ranks into the test statistic identified in Step 2. 5. Conclusion. The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion is either to reject the null hypothesis (because it is very unlikely to observe the sample data if the null hypothesis is true) or not to reject the null
  • 12. 12 hypothesis (because the sample data are not very unlikely if the null hypothesis is true). Tests with Two Independent Samples The modules on hypothesis testing presented techniques for testing the equality of means in two independent sample. An underlying assumption for appropriate use of the tests described was that the continuous outcome was approximately normally distributed or that the samples were sufficiently large (usually n1> 30 and n2> 30) to justify their use based on the Central Limit Theorem. When the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate. Mann Whitney U Test (Wilcoxon Rank Sum Test) A popular nonparametric test to compare outcomes between two independent groups is the Mann Whitney U test. The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon Test or the Wilcoxon Rank Sum Test, is used to test whether two samples are likely to derive from the same population (i.e., that the two populations have the same shape). Some investigators interpret this test as comparing the medians between the two populations. Recall that the parametric test compares the means (H0: μ1=μ2) between independent groups. In contrast, the null and two-sided research hypotheses for the nonparametric test are stated as follows: H0: The two populations are equal versus H1: The two populations are not equal. This test is often performed as a two-sided test and, thus, the research hypothesis indicates that the populations are not equal as opposed to specifying directionality. A one-sided research hypothesis is used if interest lies in detecting a positive or negative shift in one population as compared to the other. The procedure for the test involves pooling the observations from the two samples into one combined sample, keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to n1+n2, respectively Tests with Matched Samples This section describes nonparametric tests to compare two groups with respect to a continuous outcome when the data are collected on matched or paired samples. The parametric procedure for doing this was presented in the modules on hypothesis testing for the situation in which the continuous outcome was normally distributed. This section describes procedures that should be used when the outcome cannot be assumed to follow a normal distribution. There are two popular nonparametric tests to compare outcomes
  • 13. 13 between two matched or paired groups. The first is called the Sign Test and the second the Wilcoxon Signed Rank Test. Recall that when data are matched or paired, we compute difference scores for each individual and analyze difference scores. The same approach is followed in nonparametric tests. In parametric tests, the null hypothesis is that the mean difference (μd) is zero. In nonparametric tests, the null hypothesis is that the median difference is zero The Sign Test The Sign Test is the simplest nonparametric test for matched or paired data. The approach is to analyze only the signs of the difference scores Test Statistic for the Sign Test The test statistic for the Sign Test is the number of positive signs or number of negative signs, whichever is smaller. In this example, we observe 2 negative and 6 positive signs. Is this evidence of significant improvement or simply due to chance? Determining whether the observed test statistic supports the null or research hypothesis is done following the same approach used in parametric testing. Specifically, we determine a critical value such that if the smaller of the number of positive or negative signs is less than or equal to that critical value, then we reject H0 in favor of H1 and if the smaller of the number of positive or negative signs is greater than the critical value, then we do not reject H0. Notice that this is a one-sided decision rule corresponding to our one-sided research hypothesis (the two-sided situation is discussed in the next example). Computing P-values for the Sign Test With the Sign test we can readily compute a p-value based on our observed test statistic. The test statistic for the Sign Test is the smaller of the number of positive or negative signs and it follows a binomial distribution with n = the number of subjects in the study and p=0.5 (See the module on Probability for details on the binomial distribution). In the example above, n=8 and p=0.5 (the probability of success under H0). By using the binomial distribution formula: we can compute the probability of observing different numbers of successes during 8 trials. One-Sided versus Two-Sided Test In the example looking for differences in repetitive behaviors in autistic children, we used a one-sided test (i.e., we hypothesize improvement after taking the drug). A two sided test
  • 14. 14 can be used if we hypothesize a difference in repetitive behavior after taking the drug as compared to before. From the table of critical values for the Sign Test, we can determine a two-sided critical value and again reject H0 if the smaller of the number of positive or negative signs is less than or equal to that two-sided critical value. Alternatively, we can compute a two-sided p-value. With a two-sided test, the p-value is the probability of observing many or few positive or negative signs. If the research hypothesis is a two sided alternative (i.e., H1: The median difference is not zero), then the p-value is computed as: p-value = 2*P(x < 2). Notice that this is equivalent to p-value = P(x < 2) + P(x > 6), representing the situation of few or many successes. Recall in two-sided tests, we reject the null hypothesis if the test statistic is extreme in either direction. Thus, in the Sign Test, a two-sided p-value is the probability of observing few or many positive or negative signs. Here we observe 2 negative signs (and thus 6 positive signs). The opposite situation would be 6 negative signs (and thus 2 positive signs as n=8). The two- sided p-value is the probability of observing a test statistic as or more extreme in either direction (i.e., P(x < 2) + P(x > 6) = 0.0039 + 0.0313 + 0.1094 + 0.1094 + 0.0313 + 0.0039 = 2(0.1446) = 0.2892). When Difference Scores are Zero There is a special circumstance that needs attention when implementing the Sign Test which arises when one or more participants have difference scores of zero (i.e., their paired measurements are identical). If there is just one difference score of zero, some investigators drop that observation and reduce the sample size by 1 (i.e., the sample size for the binomial distribution would be n-1). This is a reasonable approach if there is just one zero. However, if there are two or more zeros, an alternative approach is preferred.  If there is an even number of zeros, we randomly assign them positive or negative signs.  If there is an odd number of zeros, we randomly drop one and reduce the sample size by 1, and then randomly assign the remaining observations positive or negative signs. The following example illustrates the approach. Wilcoxon Signed Rank Test Another popular nonparametric test for matched or paired data is called the Wilcoxon Signed Rank Test. Like the Sign Test, it is based on difference scores, but in addition to analyzing the signs of the differences, it also takes into account the magnitude of the observed differences
  • 15. 15 Tests with More than Two Independent Samples In the modules on hypothesis testing we presented techniques for testing the equality of means in more than two independent samples using analysis of variance (ANOVA). An underlying assumption for appropriate use of ANOVA was that the continuous outcome was approximately normally distributed or that the samples were sufficiently large (usually nj> 30, where j=1, 2, ..., k and k denotes the number of independent comparison groups). An additional assumption for appropriate use of ANOVA is equality of variances in the k comparison groups. ANOVA is generally robust when the sample sizes are small but equal. When the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate. The Kruskal-Wallis Test A popular nonparametric test to compare outcomes among more than two independent groups is the Kruskal Wallis test. The Kruskal Wallis test is used to compare medians among k comparison groups (k > 2) and is sometimes described as an ANOVA with the data replaced by their ranks. The null and research hypotheses for the Kruskal Wallis nonparametric test are stated as follows: H0: The k population medians are equal versus H1: The k population medians are not all equal The procedure for the test involves pooling the observations from the k samples into one combined sample, keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to N, where N = n1+n2 + ...+ nk. To illustrate the procedure, consider the following example. Summary This module presents hypothesis testing techniques for situations with small sample sizes and outcomes that are ordinal, ranked or continuous and cannot be assumed to be normally distributed. Nonparametric tests are based on ranks which are assigned to the ordered data. The tests involve the same five steps as parametric tests, specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. The tests are summarized below.
  • 16. 16 Mann Whitney U Test Use: To compare a continuous outcome in two independent samples. Null Hypothesis: H0: Two populations are equal Test Statistic: The test statistic is U, the smaller of , where R1 and R2 are the sums of the ranks in groups 1 and 2, respectively. Decision Rule: Reject H0 if U < critical value from table Sign Test Use: To compare a continuous outcome in two matched or paired samples. Null Hypothesis: H0: Median difference is zero Test Statistic: The test statistic is the smaller of the number of positive or negative signs. Decision Rule: Reject H0 if the smaller of the number of positive or negative signs < critical value from table. Wilcoxon Signed Rank Test Use: To compare a continuous outcome in two matched or paired samples. Null Hypothesis: H0: Median difference is zero Test Statistic: The test statistic is W, defined as the smaller of W+ and W- which are the sums of the positive and negative ranks of the difference scores, respectively. Decision Rule: Reject H0 if W < critical value from table. Kruskal Wallis Test Use: To compare a continuous outcome in more than two independent samples. Null Hypothesis: H0: k population medians are equal Test Statistic: The test statistic is H, , where k=the number of comparison groups, N= the total sample size, nj is the sample size in the jth group and Rj is the sum of the ranks in the jth group. Decision Rule: Reject H0 if H > critical value It is important to note that nonparametric tests are subject to the same errors as parametric tests. A Type I error occurs when a test incorrectly rejects the null hypothesis. A Type II error occurs when a test fails to reject H0 when it is false. Power is the probability of a test to correctly reject H0. Nonparametric tests can be subject to low power mainly due to small sample size. Therefore, it is important to consider the
  • 17. 17 possibility of a Type II error when a nonparametric test fails to reject H0. There may be a true effect or difference, yet the nonparametric test is underpowered to detect it. For more details, interested readers should see Conover and Siegel and Castellan 4) VALIDITY AND RELIABILITY Reliability is the degree to which an assessment tool produces stable and consistent results. Types of Reliability 1. Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time. Example: A test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate the stability of the scores. 2. Parallel forms reliability is a measure of reliability obtained by administering different versions of an assessment tool (both versions must contain items that probe the same construct, skill, knowledge base, etc.) to the same group of individuals. The scores from the two versions can then be correlated in order to evaluate the consistency of results across alternate versions. Example: If you wanted to evaluate the reliability of a critical thinking assessment, you might create a large set of items that all pertain to critical thinking and then randomly split the questions up into two sets, which would represent the parallel forms. 3. Inter-rater reliability is a measure of reliability used to assess the degree to which different judges or raters agree in their assessment decisions. Inter-rater reliability is useful because human observers will not necessarily interpret answers the same way; raters may disagree as to how well certain responses or material demonstrate knowledge of the construct or skill being assessed. Example: Inter-rater reliability might be employed when different judges are evaluating the degree to which art portfolios meet certain standards. Inter-rater reliability is especially useful when judgments can be considered relatively subjective. Thus, the use of this type of reliability would probably be more likely when evaluating artwork as opposed to math problems. 4. Internal consistency reliability is a measure of reliability used to evaluate the degree to which different test items that probe the same construct produce similar results. A. Average inter-item correlation is a subtype of internal consistency reliability. It is obtained by taking all of the items on a test that probe the same construct (e.g., reading comprehension), determining the correlation
  • 18. 18 coefficient for each pair of items, and finally taking the average of all of these correlation coefficients. This final step yields the average inter-item correlation. B. Split-half reliability is another subtype of internal consistency reliability. The process of obtaining split-half reliability is begun by “splitting in half” all items of a test that are intended to probe the same area of knowledge (e.g., World War II) in order to form two “sets” of items. The entire test is administered to a group of individuals, the total score for each “set” is computed, and finally the split-half reliability is obtained by determining the correlation between the two total “set” scores. Validity refers to how well a test measures what it is purported to measure. Why is it necessary? While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your weight. Types of Validity 1. Face Validity ascertains that the measure appears to be assessing the intended construct under study. The stakeholders can easily assess face validity. Although this is not a very “scientific” type of validity, it may be an essential component in enlisting motivation of stakeholders. If the stakeholders do not believe the measure is an accurate assessment of the ability, they may become disengaged with the task. Example: If a measure of art appreciation is created all of the items should be related to the different components and types of art. If the questions are regarding historical time periods, with no reference to any artistic movement, stakeholders may not be motivated to give their best effort or invest in this measure because they do not believe it is a true assessment of art appreciation. 2. Construct Validity is used to ensure that the measure is actually measure what it is intended to measure (i.e. the construct), and not other variables. Using a panel of “experts” familiar with the construct is a way in which this type of validity can be assessed. The experts can examine the items and decide what that specific item is intended to measure. Students can be involved in this process to obtain their feedback. Example: A women’s studies program may design a cumulative assessment of learning throughout the major. The questions are written with complicated wording and phrasing. This can cause the test inadvertently becoming a test of reading comprehension, rather than a test of women’s studies. It is important that the measure is actually assessing the intended construct, rather than an extraneous factor.
  • 19. 19 3. Criterion-Related Validity is used to predict future or current performance - it correlates test results with another criterion of interest. Example: If a physics program designed a measure to assess cumulative student learning throughout the major. The new measure could be correlated with a standardized measure of ability in this discipline, such as an ETS field test or the GRE subject test. The higher the correlation between the established measure and new measure, the more faith stakeholders can have in the new assessment tool. 4. Formative Validity when applied to outcomes assessment it is used to assess how well a measure is able to provide information to help improve the program under study. Example: When designing a rubric for history one could assess student’s knowledge across the discipline. If the measure can provide information that students are lacking knowledge in a certain area, for instance the Civil Rights Movement, then that assessment tool is providing meaningful information that can be used to improve the course or program requirements. 5. Sampling Validity (similar to content validity) ensures that the measure covers the broad range of areas within the concept under study. Not everything can be covered, so items need to be sampled from all of the domains. This may need to be completed using a panel of “experts” to ensure that the content area is adequately sampled. Additionally, a panel can help limit “expert” bias (i.e. a test reflecting what an individual personally feels are the most important or relevant areas). Example: When designing an assessment of learning in the theatre department, it would not be sufficient to only cover issues related to acting. Other areas of theatre such as lighting, sound, functions of stage managers should all be included. The assessment should reflect the content area in its entirety. What are some ways to improve validity? 1. Make sure your goals and objectives are clearly defined and operationalized. Expectations of students should be written down. 2. Match your assessment measure to your goals and objectives. Additionally, have the test reviewed by faculty at other schools to obtain feedback from an outside party who is less invested in the instrument. 3. Get students involved; have the students look over the assessment for troublesome wording, or other difficulties. 4. If possible, compare your measure with other measures, or data that may be available.
  • 20. 20 5) APPROACHES TO QUALITATIVE AND QUANTITATIVE DATA ANALYSIS Qualitative analysis: Richness and Precision. The aim of qualitative analysis is a complete, detailed description. No attempt is made to assign frequencies to the linguistic features which are identified in the data, and rare phenomena receives (or should receive) the same amount of attention as more frequent phenomena. Qualitative analysis allows for fine distinctions to be drawn because it is not necessary to shoehorn the data into a finite number of classifications. Ambiguities, which are inherent in human language, can be recognised in the analysis. For example, the word "red" could be used in a corpus to signify the colour red, or as a political cateogorisation (e.g. socialism or communism). In a qualitative analysis both senses of red in the phrase "the red flag" could be recognised. The main disadvantage of qualitative approaches to corpus analysis is that their findings can not be extended to wider populations with the same degree of certainty that quantitative analyses can. This is because the findings of the research are not tested to discover whether they are statistically significant or due to chance. Quantitative analysis: Statistically reliable and generalisable results. In quantitative research we classify features, count them, and even construct more complex statistical models in an attempt to explain what is observed. Findings can be generalised to a larger population, and direct comparisons can be made between two corpora, so long as valid sampling and significance techniques have been used. Thus, quantitative analysis allows us to discover which phenomena are likely to be genuine reflections of the behaviour of a language or variety, and which are merely chance occurences. The more basic task of just looking at a single language variety allows one to get a precise picture of the frequency and rarity of particular phenomena, and thus their relative normality or abnomrality. However, the picture of the data which emerges from quantitative analysis is less rich than that obtained from qualitative analysis. For statistical purposes, classifications have to be of the hard-and-fast (so-called "Aristotelian" type). An item either belongs to class x or it doesn't. So in the above example about the phrase "the red flag" we would have to decide whether to classify "red" as "politics" or "colour". As can be seen, many linguistic terms and phenomena do not therefore belong to simple, single categories: rather they are more consistent with the recent notion of "fuzzy sets" as in the red example. Quantatitive analysis is therefore an idealisation of the data in some cases. Also, quantatitve analysis tends to sideline rare occurences. To ensure that certain statistical tests (such as chi- squared) provide reliable results, it is essential that minimum frequencies are obtained - meaning that categories may have to be collapsed into one another resulting in a loss of data richness Quantitative research focuses on numbers or quantities. Quantitative studies have results that are based on numeric analysis and statistics. Often, these studies have many participants. It is not unusual for there to be over a thousand people in a quantitative
  • 21. 21 research study. It is ideal to have a large number of participants because this gives analysis more statistical power. Qualitative research studies are focused on differences in quality, rather than differences in quantity. Results are in words or pictures rather than numbers. Qualitative studies usually have fewer participants than quantitative studies because the depth of the data collection does not allow for large numbers of participants. Quantitative and qualitative studies both have strengths and weaknesses. A particular strength of quantitative research is that statistical analysis allows for generalization (to some extent) to others. A goal of quantitative research is to choose a sample that closely resembles the population. Qualitative research does not seek to choose samples that are representative of populations. However, qualitative data does provide a depth and richness of data not possible with quantitative data. Although there are fewer participants, the researchers generally know more details about each participant. Quantitative researchers collect data on more participants, so it is not possible to have the depth and breadth of knowledge about each. Quantitative analysis allows researchers to test specific hypotheses. Depending on research findings, hypotheses are either supported or not supported. Qualitative analysis is usually for more exploratory purposes. Researchers are typically open to allowing the data to take them in different directions. Because qualitative research is more open to different interpretations, qualitative researchers may be more prone to accusations of bias and personal subjectivity. An example of qualitative research: Joe wants to study the coming out processes of gays and lesbians in rural settings. He doesn't feel that the process can be well-represented by having participants fill out questionnaires with closed-ended (multiple choice) questions. He knows it's a complex process, and he'd like to get information from not only gays and lesbians but from their families and friends. He doesn't have the time or money to explore the lives of hundreds of participants, so he chooses five gays and lesbians who he thinks have interesting stories. He conducts a series of interviews with each participant. He then asks them all to identify three family members or friends, and Joe interviews them as well. An example of quantitative: Stephanie is interested in the types of birth control that college students use most frequently at her university. She sends an email-based survey to a randomly selected group of 500 students. About 400 respond to the survey. They go to a website to fill out the survey, which takes about 5-10 minutes. The data is compiled in a database. Stephanie runs statistical analysis to determine the most popular types of birth control.