2. What is biostatistics
Statistics is the science and art of collecting,
summarizing, and analyzing data that are
subject to random variation.
Bio statistics is the application of statistics and
mathematical methods to the design and
analysis of health, biomedical, and biological
studies.
Chap 9-2
3. Different Tests of Significance
1. One-Sample z-test or t-test
a. Compares one sample mean versus a population mean
2. Two-Sample t-test
a. Compares one sample mean versus another sample
mean
a. Independent t-tests (equal samples)
b. Dependent t-tests (dependent/paired samples)
3. One-way analysis of variance (ANOVA)
a. Comparing several sample means
Chap 9-3
4. How to properly use
Biostatistics
Develop an underlying question of interest
Generate a hypothesis
Design a study (Protocol)
Collect Data
Analyze Data
Descriptive statistics
Statistical Inference
Chap 9-4
6. Sampling Techniques
Population
Simple Random Stratified Random Systematic Cluster Convenience
Sample Sample Sampling Sampling Sampling
Bias free Bias free Biased Bias free Biased
sample sample sample sample sample
Chap 9-6
7. Example
How are my 10 patients doing after I put them
on an anti-hypertensive medications?
Describe the results of your 10 patients
Chap 9-7
8. Example
What is the in hospital mortality rate after
open heart surgery at SAL hospital so far this
year
Describe the mortality
What is the in hospital mortality after open
heart surgery likely to be this year, given
results from last year
Estimate probability of death for patients like those
seen in the previous year.
Chap 9-8
9. Misuse of statistics
About 25% of biological research is flawed
because of incorrect conclusions drawn from
confounded experimental designs and misuse
of statistical methods
Chap 9-9
11. Hypothesis Testing Process
Assume the
population
mean age is 50.
( H 0 : µ = 50) Identify the Population
Is X = 20 likely if µ = 50 ? Take a Sample
No, not likely!
REJECT
Null Hypothesis
( X = 20 )
Chap 9-11
12. Reason for Rejecting H0
Sampling Distribution of X
It is unlikely that ... Therefore,
we would get a we reject the
sample mean of null hypothesis
this value ... that m = 50.
... if in fact this were
the population mean.
20 µ = 50 X
If H0 is true
Chap 9-12
13. Components of Biostatistics
Biostatistics
Statistical
Descriptive
Inference
Estimation Hypothesis Testing
Confidence Intervals P-values
Chap 9-13
14. Normal Distribution
A variable is said to be normally distributed or to have a
normal distribution if its distribution has the shape of a
normal curve.
Chap 9-14
15. Normal distribution
bell-shaped
symmetrical about the mean (No skewness)
total area under curve = 1
approximately 68% of distribution is within one
standard deviation of the mean
approximately 95% of distribution is within two
standard deviations of the mean
approximately 99.7% of distribution is within 3
standard deviations of the mean
Mean = Median = Mode
Chap 9-15
16. Empirical Rule
About 68% of the area lies
within 1 standard deviation
68% of the mean
−3σ −2σ −σ µ +σ +2σ +3σ
About 95% of the area lies
within 2 standard
deviations
About 99.7% of the area lies within 3
standard deviations of the mean
Chap 9-16
18. Level of Significance, α
Is designated by α , (level of significance)
Typical values are .01, .05, .10
Is selected by the researcher at the beginning
Provides the critical value(s) of the test
Chap 9-18
19. The z-Test for Comparing
Population Means
Critical values for standard normal distribution
Chap 9-19
20. Level of Significance I claim that mean CVD
in the INDIA is atleast 3!
and the Rejection Region
α
H0: µ ≥ 3 Critical
H1: µ < 3 Value(s)
Rejection 0
Regions α
H0: µ ≤ 3
H1: µ > 3
0
α /2
H0: µ = 3
H1: µ ≠ 3
0
Chap 9-20
21. Hypothesis Testing
1. State the research question.
2. State the statistical hypothesis.
3. Set decision rule.
4. Calculate the test statistic.
5. Decide if result is significant.
6. Interpret result as it relates to your research
question.
Chap 9-21
22. Rejection & Nonrejection
Regions I claim that mean CVD
in the INDIA is atleast 3!
Two-tailed test Left-tailed test Right-tailed
Sign in Ha = < >
Rejection region Both sides Left side Right side
Chap 9-22
23. The Null Hypothesis, H0
States the assumption (numerical) to be
tested
e.g.: The average number of CVD in INDIA is at
least three ( H 0 : µ ≥ 3
)
Is always about a population parameter (
H 0 : µ ≥ 3 about a sample
), not statistic (
) H0 : X ≥ 3
Chap 9-23
24. The Null Hypothesis, H0
(continued)
Begins with the assumption that the null
hypothesis is true
Similar to the notion of innocent until
proven guilty
Chap 9-24
25. The Alternative Hypothesis, H1
Is the opposite of the null hypothesis
e.g.: The average number of CVD in INDIA is
less than 3 ( H1 : µ < 3)
Never contains the “=” sign
May or may not be accepted
Chap 9-25
26. General Steps in
Hypothesis Testing
e.g.: Test the assumption that the true mean number of of
σ
CVD in INDIA is at least three ( Known)
1. State the H0 H0 : µ ≥ 3
2. State the H1 H1 : µ < 3
3. Choose α α =.05
4. Choose n n = 100
5. Choose Test Z test
Chap 9-26
27. General Steps in
Hypothesis Testing (continued)
6. Set up critical value(s) Reject H0
α
Z
-1.645
100 persons surveyed
7. Collect data
Computed test stat =-2,
8. Compute test statistic p-value = .0228
and p-value
9. Make statistical decision Reject null hypothesis
The true mean number of CVD is
10. Express conclusion less than 3 in human
population. Chap 9-27
28. The z-Test for Comparing
Population Means
Critical values for standard normal distribution
Chap 9-28
29. p-Value Approach to Testing
Convert Sample Statistic (e.g. X ) to Test
Statistic (e.g. Z, t or F –statistic)
Obtain the p-value from a table or computer
Compare the p-value with
≥ α , do not reject H0
If p-value
If p-value ≤ α , reject H0
Chap 9-29
30. Comparison of Critical-Value &
P-Value Approaches
Critical-Value Approach P-Value Approach
Step1 State the null and alternative Step1 State the null and
hypothesis. alternative hypothesis.
Step 2 Decide on the significance Step 2 Decide on the significance
level, α. level, α.
Step 3 Compute the value of the Step 3 Compute the value of the
test statistic. test statistic.
Step 4 Determine the critical
Step 4 Determine the P-value.
value(s).
Step 5 If the value of the test
statistic falls in the rejection region, Step 5 If P < α, reject Ho;
reject Ho; otherwise, do not reject otherwise do not reject Ho.
Ho.
Step 6 Interpret the result of the Step 6 Interpret the result of the
hypothesis test. hypothesis test.
Chap 9-30
31. Result Probabilities
H0: Innocent
Jury Trial Hypothesis Test
The Truth The Truth
Verdict Innocent Guilty Decision H0 True H0 False
Do Not Type II
Innocent Correct Error Reject 1-α
Error (β )
H0
Type I Power
Guilty Error Correct Reject Error
H0 (1 - β )
(α )
Chap 9-31
32. Type I & II Errors Have an
Inverse Relationship
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.
β
α
Chap 9-32
33. Critical Values
Approach to Testing
Convert sample statistic (e.g.: X ) to test
statistic (e.g.: Z, t or F –statistic)
Obtain critical value(s) for a specified α
from a table or computer
If the test statistic falls in the critical region, reject
H0
Otherwise do not reject H0
Chap 9-33
34. One-tail Z Test for Mean
( σ Known)
Assumptions
Population is normally distributed
If not normal, requires large samples
Null hypothesis has ≤ or ≥ sign only
Z test statistic
X − µX X −µ
Z= =
σX σ/ n
Chap 9-34
35. Rejection Region
H0: µ ≥ µ 0 H0: µ ≤ µ 0
H1: µ < µ 0 H1: µ > µ 0
Reject H0 Reject H0
α α
0 Z 0 Z
Z Must Be Significantly Small values of Z don’t
Below 0 to reject H0 contradict H0
Don’t Reject H0 !
Chap 9-35
36. Example: One Tail Test
Q. Does an average box of
cereal contain more than
368 grams of cereal? A
random sample of 25
boxes showed X = 372.5.
The company has 368 gm.
specified σ to be 15
grams. Test at the H0:
α = 0.05 level.
µ ≤ 368
H1: µ > 368
Chap 9-36
37. Finding Critical Value: One Tail
Standardized Cumulative
What is Z given α = 0.05? Normal Distribution Table
(Portion)
σZ =1 Z .04 .05 .06
.95 1.6 .9495 .9505 .9515
α = .05
1.7 .9591 .9599 .9608
0 1.645 Z 1.8 .9671 .9678 .9686
Critical Value 1.9 .9738 .9744 .9750
= 1.645
Chap 9-37
38. Example Solution: One Tail Test
H0: µ ≤ 368
H1: µ > 368
α = 0.5 X−µ
Z= = 1.50
n = 25 σ
Critical Value: 1.645 n
Reject
.05 Do Not Reject at α = .05
Conclusion:
0 1.645 Z No evidence that true
1.50
mean is more than 368
Chap 9-38
39. p -Value Solution
p-Value is P(Z ≥ 1.50) = 0.0668
Use the
alternative P-Value =.0668
hypothesis
to find the 1.0000
direction of - .9332
the rejection .0668
region.
0 1.50 Z
From Z Table: Z Value of Sample
Lookup 1.50 to Statistic
Obtain .9332 Chap 9-39
40. p -Value Solution (continued)
(p-Value = 0.0668) ≥ (α = 0.05)
Do Not Reject.
p Value = 0.0668
Reject
α = 0.05
0 1.645
Z
1.50
Test Statistic 1.50 is in the Do Not Reject
Region Chap 9-40
41. Example: Two-Tail Test
Q. Does an average box
of cereal contain 368
grams of cereal? A
random sample of 25
boxes showed X =
372.5. The company 368 gm.
has specified σ to be
15 grams. Test at the
H0: µ = 368
α = 0.05 level.
H1: µ ≠ 368
Chap 9-41
42. Example Solution: Two-Tail Test
H0: µ = 368 Test Statistic:
H1: µ ≠ 368
X − µ 372.5 − 368
α = 0.05 Z= = = 1.50
σ 15
n = 25 n 25
Critical Value: ±1.96
Decision:
Reject
Do Not Reject at α = .05
.025 .025
Conclusion:
No Evidence that True
-1.96 0 1.96 Z Mean is Not 368
1.50 Chap 9-42
43. p-Value Solution
(p Value = 0.1336) ≥ (α = 0.05)
Do Not Reject.
p Value = 2 x 0.0668
Reject Reject
α = 0.05
0 1.50 1.96
Z
Test Statistic 1.50 is in the Do Not Reject
Region Chap 9-43
44. Connection to
Confidence Intervals
For X = 372.5, σ = 15 and n = 25,
the 95% confidence interval is:
372.5 − ( 1.96 ) 15 / 25 ≤ µ ≤ 372.5 + ( 1.96 ) 15 / 25
or
366.62 ≤ µ ≤ 378.38
If this interval contains the hypothesized mean (368),
we do not reject the null hypothesis.
It does. Do not reject.
Chap 9-44
45. What is a t Test?
Commonly Used
Definition: Comparing
two means to see if
they are significantly
different from each
other
Technical Definition:
Any statistical test that
uses the t family of
distributions
Chap 9-45
46. Independent Samples t Test
Use this test when you
want to compare the
means of two Independent Independent
Mean Mean
independent samples #1 #2
on a given variable
• “Independent” means
that the members of
one sample do not Compare using t test
include, and are not
matched with,
members of the other
sample
Example:
• Compare the average
height of 50 randomly
selected men to that of Chap 9-46
50 randomly selected
47. Dependent Samples t Test
Used to compare the
means of a single
sample or of two
matched or paired
samples
Example:
• If a group of students
took a math test in
March and that same
group of students took
the same math test two
months later in May, we
could compare their
average scores on the
two test dates using a
dependent samples t Chap 9-47
test
48. Comparing the Two t Tests
Independent Samples Dependent Samples
Tests the equality of the means Tests the equality of the means
from two independent groups between related groups or of two
(diagram below) variables within the same group
Relies on the t distribution to (diagram below)
produce the probabilities used to Relies on the t distribution to
test statistical significance produce the probabilities used to
test statistical significance
Person Person Person Person
#1 #2 #1 #1
Treatment group Control group Before treatment After treatment
Chap 9-48
49. Types
One sample
compare with population
Unpaired
compare with control
Paired
same subjects: pre-post
Z-test
large samples >30
Chap 9-49
50. Compare Means (or medians)Example:
Compare blood presures of two or more groups, or
compare BP of one group with a theoretical value.
1 Group:
1. One Sample t test
2. Wilcoxon rank sum test
2 Groups:
1. Unpaired t test
2. Paired t test
3. Mann-Whitney t test
4. Welch’s corrected t test
5. Wilcoxon matched pairs test
Chap 9-50
51. 3-26 Groups:
1. One-way ANOVA
2. Repeated measures ANOVA
3. Kruskal-Wallis test
4. Friedman test
(All with post tests) Raw data Average data
Mean, SD, & NAverage data Mean, SEM, &
N
Chap 9-51
52. Is there a difference?
between you…means,
who is meaner? Chap 9-52
53. Statistical Analysis
control treatment
group group
mean mean
Is there a difference?
Slide downloaded from the Internet
Chap 9-53
54. What does difference mean?
The mean difference
medium is the same for all
variability three cases
high
variability
low
variability
Slide downloaded from the Internet
Chap 9-54
55. What does difference mean?
medium
variability
high
variability
Which one shows
low the greatest
variability difference?
Slide downloaded from the Internet
Chap 9-55
56. t Test: σ Unknown
Assumption
Population is normally distributed
If not normal, requires a large sample
T test statistic with n-1 degrees of freedom
X −µ
t=
S/ n
Chap 9-56
57. Example: One-Tail t Test
Does an average box of
cereal contain more than
368 grams of cereal? A
random sample of 36
boxes showed X = 372.5, 368 gm.
and s = 15. Test at the
α = 0.01 level. H0: µ ≤ 368
H1: µ >
σ is not given
368
Chap 9-57
58. Example Solution: One-Tail
H0: µ ≤ 368 Test Statistic:
H1: µ > 368
X − µ 372.5 − 368
α = 0.01 t= = = 1.80
S 15
n = 36, df = 35 n 36
Critical Value: 2.4377
Reject Decision:
Do Not Reject at α = .01
.01
Conclusion:
No evidence that true
0 2.4377 t35
1.80
mean is more than 368
Chap 9-58
59. The t Table
Since it takes into
account the changing
shape of the
distribution as n
increases, there is a
separate curve for
each sample size (or
degrees of freedom).
However, there is not
enough space in the
table to put all of the
different probabilities
corresponding to each
possible t score.
The t table lists
commonly used critical
regions (at popular
alpha levels).
Chap 9-59
61. The z-Test for Comparing
Population Means
Critical values for standard normal distribution
Chap 9-61
62. Summary
We can use the z distribution for testing
hypotheses involving one or two
independent samples
To use z, the samples are independent and
normally distributed
The sample size must be greater than 30
Population parameters must be known
Chap 9-62