Session 9 intro_of_topics_in_hypothesis_testing

Training on Teaching
Basic Statistics for
Tertiary Level Teachers
Summer 2008
Note: Most of the Slides were prepared by Prof.
Josefina Almeda of the School of Statistics, UP
Diliman
Introduction to Topics
in Hypothesis Testing

TEACHING BASIC STATISTICS ….
Session 7.2
Two Areas of Inferential Statistics
 Estimation
Point Estimation
Interval Estimation
 Hypothesis Testing

Session 7.3
Research Problem: How effective is
Minoxidil in treating male pattern baldness?
Specific Objectives:
1. To estimate the population proportion of patients
who will show new hair growth after being
treated with Minoxidil.
2. To determine whether treatment using Minoxidil
is better than the existing treatment that is
known to stimulate hair growth among 40% of
patients with male pattern baldness.
Question: How do we achieve these objectives
using inferential statistics?
This can be answered
by ESTIMATION
This can be answered by
HYPOTHESIS TESTING

Session 7.4
What is Hypothesis Testing?
 Hypothesis testing is an area of
statistical inference in which one
evaluates a conjecture about some
characteristic of the parent population
based upon the information contained
in the random sample.
 Usually the conjecture concerns one
of the unknown parameters of the
population.

Session 7.5
What is a Hypothesis?
 A hypothesis is a
claim or statement
about the population
parameter
 Examples of parameters
are population mean
or proportion
 The parameter must
be identified before analysis
This drug is guaranteed
to change in cholesterol
levels (on the average)
by more than 30%!
© 1984-1994 T/Maker Co.

Session 7.6
Example of Hypothesis
 The mean body temperature for patients
admitted to elective surgery is not equal to
37.0oC.
Note: The parameter of interest here is
which is the mean body
temperature for patients admitted to
elective surgery.

Session 7.7
Example of a Hypothesis
 The proportion of registered voters in
Quezon City favoring Candidate A
exceeds 0.60.
Note: The parameter of interest here is p
which is the proportion of registered
voters in Quezon City favoring Candidate
A.

Session 7.8
Things to keep in mind
 Analyze a sample in an attempt to
distinguish between results that can
easily occur and results that are
highly unlikely
We can explain the occurrence of highly
unlikely results by saying either that a rare
event has indeed occurred or that things
aren’t as they are assumed to be.

Session 7.9
Example to illustrate the basic
approach in testing hypothesis
A product called “Gender Choice”
claims that couples “increase their
chances of having a boy up to 85%, a
girl up to 80%.” Suppose you conduct
an experiment that includes 100
couples who want to have baby girls,
and they all follow the Gender Choice
to have a baby girl.

Session 7.10
Con’t of Example
Using common sense and no real
formal statistical methods, what
should you conclude about the
effectiveness of Gender Choice if
the 100 babies include
a. 52 girls
b. 97 girls

Session 7.11
Solution to Example:
a. We normally expect around 50 girls
in 100 births. The result of 52 girls is
close but higher than 50, so we should
conclude that Gender Choice is
effective. If the 100 couples used no
special methods of gender selection,
the result of 52 girls could easily occur
by chance.

Session 7.12
Solution to Example
b. The result of 97 girls in 100 births is
extremely unlikely to occur by chance. It
could be explained in one of two ways:
Either an extremely rare event has
occurred by chance, or Gender Choice is
effective because of the extremely low
probability of getting 97 girls by chance,
the more likely explanation is that the
product is effective.

Session 7.13
Explanation of Example
 We should conclude that the product is
effective only if we get significantly more
girls that we would expect under normal
circumstances.
 Although the outcomes of 52 girls and 97
girls are both “above average,” the result of
52 girls is not significant, whereas 97 girls
does constitute a significant result.

Session 7.14
Components of a Formal
Hypothesis Test

Session 7.15
Null Hypothesis
 denoted by Ho
 the statement being tested
 it represents what the experimenter doubts to
be true
 must contain the condition of equality and
must be written with the symbol =, , or
When actually conducting the
test, we operate under the
assumption that the
parameter equals some
specific value.

Session 7.16
For the mean, the null hypothesis will be
stated in one of these three possible forms:
 Ho: = some value
 Ho: some value
 Ho: some value
Note: the value of can be obtained
from previous studies or from
knowledge of the population

Session 7.17
Example of Null Hypothesis
 The null hypothesis corresponding to the
common belief that the mean body
temperature is 37oC is expressed as
Ho:
 We test the null hypothesis directly in
the sense that we assume it is true and
reach a conclusion to either reject Ho or
fail to reject Ho.
Co
37

Session 7.18
Alternative Hypothesis
 denoted by Ha
 Is the statement that must be true if the null
hypothesis is false
 the operational statement of the theory that
the experimenter believes to be true and
wishes to prove
 Is sometimes referred to as the research
hypothesis

Session 7.19
For the mean, the alternative hypothesis will be
stated in only one of three possible forms:
 Ha: some value
 Ha: > some value
 Ha: < some value
Note: Ha is the opposite of Ho. For example,
if Ho is given as = 37.0, then it follows
that the alternative hypothesis is given
by Ha: 37.0.

Session 7.20
Note About Using or in Ho:
 Even though we sometimes express Ho
with the symbol or as in Ho: 37.0
or Ho: 37.0, we conduct the test by
assuming that = 37.0 is true.
 We must have a single fixed value for
so that we can work with a single
distribution having a specific mean.

Session 7.21
Note About Stating Your Own
Hypotheses:
 If you are conducting a research study
and you want to use a hypothesis test
to support your claim, the claim must
be stated in such a way that it
becomes the alternative hypothesis, so
it cannot contain the condition of
equality.

Session 7.22
If you believe that your brand of
refrigerator lasts longer than the mean
of 14 years for other brands, state the
claim that > 14, where is the
mean life of your refrigerators.
Ho: = 14 vs. Ha: > 14
Example in Stating your Hypothesis

Session 7.23
Some Notes:
 In this context of trying to support the goal of
the research, the alternative hypothesis is
sometimes referred to as the research
hypothesis.
 Also in this context, the null hypothesis is
assumed true for the purpose of conducting
the hypothesis test, but it is hoped that the
conclusion will be rejection of the null
hypothesis so that the research hypothesis is
supported.

Session 7.24
Suppose that the government is deciding
whether or not to approve the manufacturing of a
new drug. A drug is to be tested to find out if it
can dissolve cholesterol deposits in the heart’s
arteries. A major cause of heart diseases is the
hardening of the arteries caused by the
accumulation of cholesterol. The Bureau of
Food and Drug (BFaD) will not allow the
marketing of the drug unless there is strong
evidence that it is effective.
Example of a Research Problem

Session 7.25
Con’t of Research Problem
A random sample of 98 middle-aged men has been
selected for the experiment. Each man is given a
standard daily dosage of the drug for 2 consecutive
weeks. Their cholesterol levels are measured at the
beginning and at the end of the test. The interest is to
determine if the intake of the drug lead to reduced
cholesterol levels; that is, a hypothesis test will have to
be performed to determine if the drug is effective or not.
Based on the results of the experiment, the director of
BFaD will decide whether to release the drug to the
public or postpone its release and request for more
research.

Session 7.26
To perform a statistical hypothesis test, we must
firstly identify the parameter of interest, and have
some educated guess about the true value of the
parameter. In the case of the BFaD example, the
possible states of the drug’s effectiveness are
referred to as hypotheses. Because the director
wants only to know whether it is effective or not,
either of the following hypotheses applies.
The drug is ineffective.
The drug is effective.

Session 7.27
To measure the effectiveness of the drug for each
middle-aged man, we can look at the percent change
in cholesterol levels experienced by all middle-aged
men who took the drug before and after they took the
drug.
We summarize effectiveness in terms of the
population mean.
Let be the population mean of the percent change in
cholesterol levels. BFaD decides to classify the drug
as effective only if, on the average, it reduces the
cholesterol levels by more than 30% ( 30%).

Session 7.28
Stating the Null Hypothesis
The null hypothesis represents no
practical change in cholesterol levels
before and after the drug use. In terms of
, we say
Ho: 30%
This means that the cholesterol level is reduced
by 30% or less.

Session 7.29
Stating the Alternative Hypothesis
Ha: 30%
Whenever sample results fail to support the
null hypothesis, the conclusion we accept is
the alternative hypothesis. In our illustration,
if results from the sample of percentage
change in cholesterol levels fail to support Ho,
then the director concludes Ha and says that
the drug is effective.

Session 7.30
What is a Test of Significance?
• A test of significance is a problem of
deciding between the null and the alternative
hypotheses on the basis of the information
contained in a random sample.
• The goal will be to reject Ho in favor of Ha,
because the alternative is the hypothesis that
the researcher believes to be true. If we are
successful in rejecting Ho, we then declare
the results to be “significant”.

Session 7.31
Note About Testing the Validity of
Someone Else’s Claim
Sometimes we test the validity of someone else’s claim,
such as the claim of the Coca Cola Bottling Company that
“the mean amount of Coke in cans is at least 355 ml,”
which becomes the null hypothesis of Ho: 355
In this context of testing the validity of someone
else’s claim, their original claim sometimes
becomes the null hypothesis (because it contains
equality), and it sometimes becomes the
alternative hypothesis (because it does not
contain the equality).

Session 7.32
Two Types of Errors
 Type I Error
 Type II Error

Session 7.33
Type I Error
 The mistake of rejecting the null hypothesis
when it is true.
 It is not a miscalculation or procedural
misstep; it is an actual error that can occur
when a rare event happens by chance.
 The probability of rejecting the null
hypothesis when it is true is called the
significance level ( ).
 The value of is typically predetermined,
and very common choices are = 0.05
and = 0.01.

Session 7.34
Examples of Type I Error
1. The mistake of rejecting the null
hypothesis that the mean body
temperature is 37.0 when that mean
is really 37.0.
2. BFaD allows the release of an
ineffective medicine

Session 7.35
Type II Error
 The mistake of failing to reject the
null hypothesis when it is false.
 The symbol (beta) is used to
represent the probability of a type II
error.

Session 7.36
Examples of Type II Errors
1. The mistake of failing to reject the
null hypothesis ( = 37.0) when it
is actually false (that is, the mean
is not 37.0).
2. BFaD does not allow the release
of an effective drug.

Session 7.37
Summary of Possible Decisions in
Hypothesis Testing
True Situation
Decision
The null
hypothesis
is true.
The null
hypothesi
s is false.
We decide
to reject the
null
hypothesis.
TYPE I error
(rejecting a
true null
hypothesis)
CORRECT
decision
We fail to
reject the
null
hypothesis.
CORRECT
decision
TYPE II error
(failing to
reject
a false null
hypothesis)
TRIAL: The accused did not do it.
TRUTH
VERDICT INNOCEN
T
GUILTY
GUILTY Type 1
Error
Correct
INNOCEN
T
Correct Type 2
Error
ANALOGY

Session 7.38
 The experimenter is free to determine . If the
test leads to the rejection of Ho, the researcher
can then conclude that there is sufficient evidence
supporting Ha at level of significance.
 Usually, is unknown because it’s hard to
calculate it. The common solution to this difficulty
is to “withhold judgment” if the test leads to the
failure to reject Ho.
 and are inversely related. For a fixed
sample size n, as decreases increases and
vice-versa.
Controlling Type I and Type II
Errors

Session 7.39
 In almost all statistical tests, both and
can be reduced by increasing the sample
size.
Errors
 Because of the inverse relationship of
and , setting a very small should also
be avoided if the researcher cannot afford
a very large risk of committing a Type II
error.

Session 7.40
Common Choices
of
Consequences of
Type I error
0.01 or smaller
0.05
0.10
very serious
moderately serious
not too serious
 The choice of usually depends on the
consequences associated with making a
Type I error.
Errors

Session 7.41
Errors
 The usual practice in research and industry
is to determine in advance the values of
and n, so the value of is determined.
 Depending on the seriousness of a type I
error, try to use the largest that you can
tolerate.
 For type I errors with more serious
consequences, select smaller values of .
Then choose a sample size n as large as is
reasonable, based on considerations of
time, cost, and other such relevant factors.

Session 7.42
Example to illustrate Type I and
Type II Errors
 Consider M&Ms (produced by Mars,
Inc.) and Bufferin brand aspirin tablets
(produced by Bristol-Myers Products).
 The M&M package contains 1498
candies. The mean weight of the
individual candies should be at least
0.9085 g., because the M&M package
is labeled as containing 1361 g.

Session 7.43
 The Bufferin package is labeled as holding 30
tablets, each of which contains 325 mg of
aspirin.
 Because M&Ms are candies used for
enjoyment whereas Bufferin tablets are drugs
used for treatment of health problems, we are
dealing with two very different levels of
seriousness.
Type II Errors

Session 7.44
If the M&Ms don’t have a population
mean weight of 0.9085 g, the
consequences are not very serious,
but if the Bufferin tablets don’t have a
mean of 32.5 mg of aspirin, the
consequences could be very serious.
Type II Errors

Session 7.45
If the M&Ms have a mean that is too
large, Mars will lose some money but
consumers will not complain.
In contrast, if the Bufferin tablets have
too much aspirin, Bristol-Myers could
be faced with consumer lawsuits.
Type II Errors

Session 7.46
Consequently, in testing the claim that
= 0.9085 g for M&Ms, we might choose
= 0.05 and a sample size of n = 100.
In testing the claim of = 325 mg for
Bufferin tablets, we might choose = 0.01
and a sample size of n = 500.
The smaller significance level and large
sample size n are chosen because of the more
serious consequences associated with the
commercial drug.
Type II Errors

Session 7.47
The test statistic should tend to take on certain
values when Ho is true and different values when
Ha is true.
The decision to reject Ho depends on the value of
the test statistic
• A decision rule based on the value of the test
statistic:
Reject Ho if the computed value of the
test statistic falls in the region of rejection.
The Test Statistic - a statistic computed from the
sample data that is especially sensitive to the differences
between Ho and Ha

Session 7.48
Factors that Determine the Region of
Rejection
 the behavior of the test statistic if the null hypotheses
were true
 the alternative hypothesis: the location of the region
of rejection depends on the form of Ha
 level of significance ( ): the smaller is,
the smaller the region of rejection
Region of Rejection or Critical Region-the set
of all values of the test statistic which will lead to the
rejection of Ho

Session 7.49
Critical Value/s
 the value or values that separate the
critical region from the values of the
test statistic that would not lead to
rejection of the null hypothesis.
 It depends on the nature of the null
hypothesis, the relevant sampling
distribution, and the level of
significance.

Session 7.50
Types of Tests
 Two-tailed Test. If we are primarily concerned with
deciding whether the true value of a population
parameter is different from a specified value, then
the test should be two-tailed. For the case of the
mean, we say Ha: 0.
 Left-tailed Test. If we are primarily concerned with
deciding whether the true value of a parameter is
less than a specified value, then the test should be
left-tailed. For the case of the proportion, we say
Ha: P P0.
 Right-tailed Test. If we are primarily concerned
with deciding whether the true value of a parameter
is greater than a specified value, then we should
use the right-tailed test. For the case of the standard
deviation, we say Ha: 0.

Session 7.51
Level of Significance
and the Rejection Region
Ho: 30
Ha: < 30
0
0
0
Ho: 30
Ha: > 30
Ho: 30
Ha: 30
/2
Critical
Value(s)
Rejection
Regions

Session 7.52
The p-value - the smallest level of
significance at which Ho will be rejected
based on the information contained in the
sample
An Alternative Form of Decision Rule
(based on the p-value)
Reject Ho if the p-value is less than or
equal to the level of significance ( ).

Session 7.53
If the level of significance =0.05,
p-value Decision
0.01 Reject Ho.
0.05 Reject Ho.
0.10 Do not reject Ho
Example of Making Decisions
Using the p-value

Session 7.54
Conclusions in Hypothesis Testing
1. Fail to reject the null hypothesis Ho.
2. Reject the null hypothesis Ho.
Notes:
 Some texts say “accept the null hypothesis”
instead of “fail to reject the null hypothesis.”
 Whether we use the term accept or fail to
reject, we should recognize that we are not
proving the null hypothesis; we are merely
saying that the sample evidence is not strong
enough to warrant rejection of the null
hypothesis.

Session 7.55
Wording of Final Conclusion
Start
Does
the
original
claim contain
the condition
of
equality
No (Original claim
does not contain
equality and
becomes Ha)
Do you reject
Ho?
“The sample data
support the claim
that….(original
claim).”
Yes
(Reject Ho)
No
(Fail to
Reject Ho)
“There is no sufficient
sample evidence to
support the claim
that….(original claim).”
(This is the
only case in
which the
original claim
is
supported.)

Session 7.56
Wording of Final Conclusion
Start
Does
the
original
claim contain
the condition
of
equality
Yes (Original claim
contains equality
and becomes Ho)
Do you reject
Ho?
“There is sufficient
evidence to
warrant rejection of
the claim
that….(original
claim).”
Yes
(Reject Ho)
No
(Fail to Reject
Ho)
“There is no sufficient
sample evidence to
warrant rejection of the
claim that….(original
claim).”
(This is the
only case in
which the
original claim
is rejected.)

Session 7.57
Example in Making Final
Conclusion
 If you want to justify the claim that the
mean body temperature is different
from 37.0oC, then make the claim that
37.0. This claim will be an
alternative hypothesis that will be
supported if you reject the null
hypothesis of Ho: = 37.0.

Session 7.58
Example in Making Final
Conclusion
 If, on the other hand, you claim that
the mean body temperature is
37.0oC, that is = 37.0, you will
either reject or fail to reject the
claim; in either case, you will not
support the original claim.

Session 7.59
A Summary of the Steps in Hypothesis Testing
Determine the objectives of the experiment
(responsibility of the experimenter).
1. State the null and alternative hypotheses.
2. Decide on a level of significance, .
Determine the testing procedure and methods of
analysis (responsibility of the statistician).
3. Decide on the type of data to be collected and
choose an appropriate test statistic and testing
procedure.

Session 7.60
4. State the decision rule.
5. Collect the data and compute for the value of
the test statistic using the sample data.
6. a) If decision rule is based on region of
rejection: Check if the test statistic falls in the
region of rejection. If yes, reject Ho.
b) If decision rule is based on p-value:
Determine the p-value. If the p-value is less
than or equal to , reject Ho.
7. Interpret results.
Con’t of Steps in Hypothesis
Testing

Session 7.61
 End of Presentation

Session 9 intro_of_topics_in_hypothesis_testing

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Session 9 intro_of_topics_in_hypothesis_testing

Semelhante a Session 9 intro_of_topics_in_hypothesis_testing (20)

Último

Último (20)

Session 9 intro_of_topics_in_hypothesis_testing