6. Example 1
Mendellian law of dominance
Aa X Aa
A
a
A -> Tall (Dominant)
a -> Dwarf
(recessive)
Aa is …….
a X A
AA Aa Aa aa
639 Tall and
281 dwarf
Chi square requires that you have numeric
values
Chi square should not be calculated if the
expected value is less than 5
7. Choosing a Test
First Check if there is a Hypothesis to check
If yes, then decide which one
If No, then there is NO statistical test for that.
What is there.
Parametric tests have data that comes in a standard probability distribution.
Non-parametric studies can be used for both normally and not-normally
distributed data:
Question: Then why not to use them always?
Parametric tests make a lot of assumptions: If the assumptions are correct, the
results are more accurate.
8. Choosing a Test
First Check if there is a Hypothesis to check
If yes, then decide which one
If No, then there is NO statistical test for that.
What is there.
Parametric tests have data that comes in a standard probability distribution.
Non-parametric studies can be used for both normally and not-normally
distributed data:
Question: Then why not to use them always?
Parametric tests make a lot of assumptions: If the assumptions are correct, the
results are more accurate.
10. Example
Number of sixes
Rolls
0
1
2
3
Number of
48
35
15
03
p1 = P(roll 0 sixes) = P(X=0) = 0.58 P(k out of n) =
p2 = P(roll 1 six) = P(X=1) = 0.345
p3 = P(roll 2 sixes) = P(X=2) = 0.07
p4 = P(roll 3 sixes) = P(X=3) =
0.005
n!
k!(n-k)!
http://www.mathsisfun.com/data/binomial-distribution.html
pk(1-p)(n-k)
11. Parametric
• Two samples –
compare mean value
for some variable of
interest
Nonparametric
t-test for
independent
samples
Wald-Wolfowitz
runs test
Mann-Whitney
U test
KolmogorovSmirnov two
sample test
12. • Compare two variables
measured in the same
sample
Parametric
t-test for
dependent
samples
• If more than two variables Repeated
are measured in same
measures
sample
ANOVA
Nonparametric
Sign test
Wilcoxon’s
matched pairs
test
Friedman’s two
way analysis of
variance
Cochran Q
13. Null Hypothesis
Coined by English Geneticist Ronald Fischer in 1935.
At a given probability can either be true or false
Comparing
Populations/datasets:
Population A and Population B
Null hypothesis is true -> No
significant difference between the
populations.
Null Hypothesis is false ->
Significant difference between
populations
There are formula to calculate the
value of a population comparison:
There are look up tables with
values
If calculated value is less than
look up value -> Null hypothesis is
False else True
14. Null Hypothesis Testing
You have a doubt that whenever it rains your experiment
fails?!!!
NULL Hypothesis is True: No significant difference (Your
experiment failing and raining) -> Rain has nothing to do with
your experiment failing
NULL Hypothesis is false: There is a significant damage to
your experiments when it rains -> Rain ruins your experiment!!
Record when your experiment fails and check if it rains during
that time. It may be so that it happens by chance or it may be so
that there indeed is a relationship.
15. Lab study vs statistics research
http://www.youtube.com/watch?feature=player_embe
dded&v=PbODigCZqL8
16. T-test
The t-statistic was introduced in 1908 by William Sealy
Gosset
Used in a normally distributed population
http://www.socialresearchmethods.net/kb/stat_t.php
22. ANOVA: F statistics
Analysis of variance
One Way
Two way
2
2
Between Groups
Within groups
Cancel out between variation with
group variation
23. So How big is F?
Since F is
Mean Square Between / Mean Square Within
= MSG / MSE
A large value of F indicates relatively more
difference between groups than within groups
(evidence against H0)
To get the P-value, we compare to F(I-1,n-I)-distribution
• I-1 degrees of freedom in numerator (# groups -1)
• n - I degrees of freedom in denominator (rest of df)
24. Connections between SST, MST, and
standard deviation
If ignore the groups for a moment and just compute the standard deviation of
the entire data set, we see
s
2
x ij
n 1
x
2
SST
DFT
MST
So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL
variation in the data set.
SST: Sum of squares of Treatment
MST: Mean square of treatment
DFT: Degree of freedom of treatment
25. Connections between SSE, MSE, and
standard deviation
Remember:
si
xij
xi
ni
2
1
2
SS[ Within Group i ]
dfi
So SS[Within Group i] = (si2) (df i )
This means that we can compute SSE from the standard deviations and sizes
(df) of each group:
SSE
SS[Within]
2
i
s ( ni
1)
SS[Within Group i ]
2
i
s (dfi )
26.
27. Computing ANOVA F statistic
data
group
5.3
1
6.0
1
6.7
1
5.5
2
6.2
2
6.4
2
5.7
2
7.5
3
7.2
3
7.9
3
TOTAL
TOTAL/df
group
mean
6.00
6.00
6.00
5.95
5.95
5.95
5.95
7.53
7.53
7.53
WITHIN
difference:
data - group mean
plain
squared
-0.70
0.490
0.00
0.000
0.70
0.490
-0.45
0.203
0.25
0.063
0.45
0.203
-0.25
0.063
-0.03
0.001
-0.33
0.109
0.37
0.137
1.757
0.25095714
overall mean: 6.44
BETWEEN
difference
group mean - overall mean
plain
squared
-0.4
0.194
-0.4
0.194
-0.4
0.194
-0.5
0.240
-0.5
0.240
-0.5
0.240
-0.5
0.240
1.1
1.188
1.1
1.188
1.1
1.188
5.106
2.55275
F = 2.5528/0.25025 = 10.21575
30. In Summary
SST
(x ij
x)
2
2
s (DFT)
obs
SSE
(x ij
xi )
obs
SSG
2
2
si (df i )
groups
(x i
obs
SSE SSG
x)
2
n i (x i
x)
2
groups
SST; MS
SS
; F
DF
MSG
MSE
31. 2
R
Statistic
R2 gives the percent of variance due to between
group variation
R
2
SS[Between ]
SS[Total ]
We will see R2 again when we study regression.
SSG
SST
32. Where’s the Difference?
Once ANOVA indicates that the groups do not all appear to have the same means,
what do we do?
Analysis of Variance for days
Source
DF
SS
MS
treatmen
2
34.74
17.37
Error
22
59.26
2.69
Total
24
94.00
Level
A
B
P
N
8
8
9
Pooled StDev =
Mean
7.250
8.875
10.111
1.641
StDev
1.669
1.458
1.764
F
6.45
P
0.006
Individual 95% CIs For Mean
Based on Pooled StDev
----------+---------+---------+-----(-------*-------)
(-------*-------)
(------*-------)
----------+---------+---------+-----7.5
9.0
10.5
Clearest difference: P is worse than A (CI’s don’t overlap)
33. Multiple Comparisons
Once ANOVA indicates that the groups do not all
have the same means, we can compare them two
by two using the 2-sample t test
• We
need to adjust our p-value threshold because we are
doing multiple tests with the same data.
•There are several methods for doing this.
• If we really just want to test the difference between one
pair of treatments, we should set the study up that way.
34. Tuckey’s Pairwise
Comparisons
Tukey's pairwise comparisons
Family error rate = 0.0500
Individual error rate = 0.0199
95% confidence
Use alpha = 0.0199 for
each test.
Critical value = 3.55
Intervals for (column level mean) - (row level mean)
A
B
-4.863
-0.859
These give 98.01%
CI’s for each pairwise
difference.
-3.685
0.435
P
B
-3.238
0.766
98% CI for A-P is (-0.86,-4.86)
Only P vs A is significant
(both values have same sign)
35. Tukey’s Method in R
Tukey multiple comparisons of means
95% family-wise confidence level
diff
lwr
upr
B-A 1.6250 -0.43650 3.6865
P-A 2.8611 0.85769 4.8645
P-B 1.2361 -0.76731 3.2395
36. Independent sample t-test
Number of words recalled
df = (n1-1) + (n2-1) = 18
t
x1 x2
s x1 x2
t ( 0.05,18)
t
19 26
1
2.101
t ( 0.05,18)
Reject H0
7
37. T test
One sample t test
Unpaired and paired t test
Same set of subjects over a period of time
Independent sets of subjects over a period of time.
http://www.youtube.com/watch?v=JlfLnx8sh-o
One tailed and two tailed t-test:
One tailed: Average height of class A is greater than class B
Two tailed: Average height of class A is different from class B
38. Z- test statistics
Sample size is large
Population variance is known
Sample size is small population variance is unknown go
for t-test
39. Calculation of z value
Z = X - µ / sqrt (variance/n)
Suppose that in a particular geographic region, the mean and standard deviation
of scores on a reading test are 100 points, and 12 points, respectively. Our interest
is in the scores of 55 students in a particular school who received a mean score of
96. We can ask whether this mean score is significantly lower than the regional
mean — that is, are the students in this school comparable to a simple random
sample of 55 students from the region as a whole, or are their scores surprisingly
low?
We begin by calculating the standard error of the mean:
40. F-tests / Analysis of Variance (ANOVA)
t=
obtained difference between sample means
difference expected by chance (error)
variance (differences) between sample means
F=
variance (differences) expected by chance (error)
Difference between sample means is easy for 2 samples:
(e.g. X1=20, X2=30, difference =10)
but if X3=35 the concept of differences between sample
means gets tricky
41. F-tests / Analysis of Variance (ANOVA)
Simple ANOVA example
Total variability
Between treatments
variance
Within treatments
variance
----------------------------
--------------------------
Measures differences due to:
Measures differences due to:
1.
Treatment effects
1. Chance
2.
Chance
42. F-tests / Analysis of Variance (ANOVA)
F=
MSbetween
When treatment has no effect, differences
between groups/treatments are entirely due
to chance. Numerator and denominator will
be similar. F-ratio should have value around
1.00
MSwithin
When the treatment does have an effect then
the between-treatment differences
(numerator) should be larger than chance
(denominator). F-ratio should be noticeably
larger than 1.00
43. F-tests / Analysis of Variance (ANOVA)
Simple independent samples ANOVA example
F(3, 8) = 9.00, p<0.05
Placebo Drug A
Drug B
Drug C
Mean
1.0
1.0
4.0
6.0
SD
1.73
1.0
1.0
1.73
n
3
3
3
3
There is a difference
somewhere - have to use
post-hoc tests (essentially
t-tests corrected for multiple
comparisons) to examine
further
44. F Test Anova
http://www.youtube.com/watch?v=-yQb_ZJnFXw
45. Non Parametric tests
Non-parametric tests are basically used in order to overcome the underlying
assumption of normality in parametric tests. Quite general assumptions
regarding the population are used in these tests
Read more: Mann-Whitney U-test / Mann-Whitney-Wilcoxon
IT DOES NOT ASSUME THE VARIANCES TO BE EQUAL!!
46. Mann-Whitney-Wilcoxon (MWW)
or Wilcoxon Rank-Sum Test)
German Gustav Deuchler in 1914 (with a missing term in the variance) and
later independently by Frank Wilcoxon in 1945
This test is based on the idea that the particular pattern exhibited when 'm'
number of X random variables and 'n' number of Y random variables are arranged
together in increasing order of magnitude provides information about the
relationship between their parent populations.
Assumptions:
•Two samples are random and are independent of each other
•Observations are numeric and ordinal(Arranged in ranks)
It is a test of comparison of medians
48. When to use this?
Test of Normality:
Simple Histogram method
Normal Probability plot
49. How to Construct a normal probability plot
Data
rank
i
20
1
15
2
26
3
32
4
18
5
28
6
35
7
14
8
26
9
22
10
17
11
i-.5/N(X) Z
Theoritic value(Y)
al value
observed
value
Mean: 38.8
Sd= 11.4
52. Step by step
Rank the values
Add the ranks
Select larger of the two ranks.
Calculate N1, N2 and Nx and Tx (Nx – Number of
people with higher rank, rank total of larger value)
Calculate U
U = N1 * N2 + Nx * (Nx + 1)/Tx - Tx
55. Calculating U value
For smaller dataset: U is the count of ranks of smaller
dataset.
For larger dataset:
U1 = R1 – n1(n1+1)/2
U2 = R2 – n2(n2+1)/2
56. Kruskal-Wallis test (H Test)
Non-parametric test
Equivalent to Anova (F test) in parametric test
Does not require the distribution to be normal
Distribution need to be independent
Used more often when the distribution is un-equal.
Data is ordinal
Assumes the distribution to have the same shape:
1. If one distribution is skewed to left and other to the right (un-equal variance),
this test will give in-accurate result
58. Kruskal Wallis Test
Define Null or alternative Hypothesis
State probability
Calculate Degree of Freedom
Find critical value
Calculate the test hypothesis
State result
H0 Accept NULL hypothesis: There is no difference between the samples
H1 Reject NULL hypothesis : There is difference between the samples
> Critical value reject null hypothesis
59. Rank
Value
Group A
Group B
Group C
1
2
27
20
34
2
3
2
8
31
3
4
4
14
3
4
6
18
36
23
5
7
7
21
30
6
8
9
22
6
7
9
8
14
9
18
Group A
Group B
Group C
14
10
17
10
6
16
11
21
3
8
2
12
22
9
18
13
13
23
5
11
15
14
27
7
12
4
15
30
Total R 39
65
67
16
31
n
6
6
17
34
18
36
Σ
2
Ri
n
20
1
H= 12
N(N+1)
6
-
3 (N+1)
12/18(19) X (39^2/6 +
65^2/6 + 67^2/6)
- 3(18+1) =2.854
Critical value
Reject NULL
hypothesis
(5.99)
60. Kolmogorov Smirnov test(KS)
Non-parametric
Distribution is unknown
One way and Two way
One way – Checks the goodness of fit
Two way - compare the distribution
Goodness of Fit: A Hypothesis (Mendel’s law of dominance)
NULL Hypothesis:
H0 : F(x) = F*(x) for all x
H1: F(x) = F*(x) for at least one value of x
61. The K-S statistic Dn is defined as:
K-S test
Dn = max [ | Fn(x) - F(x) | ]
where
Dn is known as the K-S distance
n = total number of data points
F(x) = distribution function of the
fitted distribution
Fn(x) = i/n
i = the cumulative rank of the data
point
62. Kolmogorov Smirnov test(KS)
Group 1
Group 2
Not confident
20
4
Slightly confident
30
27
Somewhat
confident
13
28
Confident
20
18
Very confident
41
47
1. Take Total
2. Find Frequency
3. Calculate cumulative frequency
4. Find difference
5. Get the largest difference
6. Find critical value (1.36/sqrt(sample size))
7. Test goodness of fit
e.g; Our D > crit D (Distribution is unequal) -> reject NULL Hypothesis
63. Group 1 Freq
Cumul
ative
freq
Group 2
Freq
Cumul
ative
frq
D
Not
confide
nt
20
0.1612
0.1612
4
0.0322
0.0322
0.129
Slightly
confide
nt
30
0.2419
0.403
27
0.2177
0.25
0.153
Somew
hat
confide
nt
13
0.104
0.508
28
0.225
0.47
0.032
Confide
nt
20
0.161
0.669
18
0.145
0.62
0.048
Very
confide
nt
41
0.330
1
47
0.379
1
0
Critical D = 1.36/sqrt(n1+n2/n1*n2) =
Test NULL Hypothesis
64. Kolmogorov Smirnov test(KS)
Group 1
Group 2
Not confident
20
4
Slightly confident
30
27
Somewhat
confident
13
28
Confident
20
18
Very confident
41
47
1:2:3:2:1
1. Take Total
2. Find Frequency
3. Calculate cumulative frequency
4. Find difference
5. Get the largest difference
6. Find critical value (1.36/sqrt(sample size))
7. Test goodness of fit
e.g; Our D > crit D (Distribution is unequal) -> reject NULL Hypothesis
65. Methods of Estimation
Methods of moments
Maximum likelihood
Bayesian Estimators
Markov chain monte carlo…
Why??
Population size is too large
Testing a hypothesis on a set of samples
66. Probability density function
(pdf ) -> For continuous variables
Probability mass function (pmf)
-> For discrete variables
Parameter space
Set of all
Family of pdf/pmf
Estimator T is unbiased: if the sample
parameter is ……. Population parameter
69. Method of maximum likelihood
The maximum likelihood estimates of a distribution
type are the values of its parameters that produce the
maximum joint probability density or mass for the
observed data X given the chosen probability model.
Maximum likelihood is more
general, can be applied on any
probability distribution.
70. The MLE
Best parameters obtained by maximizing the
probability of the observed samples.
Has good convergence properties as sample sizes
increase: Estimated value may equal real value with
Large N
Applications are many: From speech recognition to
natural language processing to computational biology.
71. Simple MLE: Coin tossing
Toss a coin:
Head
Tail
Flip coin 10 times (n) = H, H, H, T, H, T, T, T, H, T => 1, 1, 1,
0, 1, 0, 0, 0, 1, 0
An appropriate model for getting a head in a single flip is:
If P = 0.6 and
Xi = 0 and Xi =1
72. The maximum likelihood
Example:
We want to estimate the probability, p, that individuals are
infected with a certain kind of parasite.
Ind.:
Probability of
Infected: observation:
1
1
p
2
0
1-p
3
1
p
4
1
p
5
0
1-p
6
1
p
7
1
p
8
0
1-p
9
0
1-p
10
1
p
The maximum likelihood method
(discrete distribution):
1. Write down the probability of
each observation by using the
model parameters
2. Write down the probability of
all the data
Pr( Data | p)
p 6 (1 p) 4
3. Find the value parameter(s)
that maximize this probability
73. The maximum likelihood
Example:
We want to estimate the probability, p, that individuals are
infected with a certain kind of parasite.
0
1-p
3
1
p
4
1
p
5
0
1-p
6
1
p
7
1
p
8
0
1-p
9
0
1-p
10
1
p
Pr( Data | p)
p 6 (1 p) 4
- Find the value parameter(s) that
maximize this probability
0.0012
2
L( p )
0.0008
p
0.0004
1
L(p, K, N)
1
Likelihood function:
0.0000
Ind.:
Probability of
Infected: observation:
0.0
0.2
0.4
0.6
p
0.8
1.0
75. Likelihood Function
x1
n
L(P|X1…..Xn) = Π F(xi|P)
i=1
1-x1
=P (1-P)
x1 x2
x2
1-x2
xn
1-x1
= P P …P (1-P)
=P
xn
P (1-P) ………P (1-P)
1-x2
(1-P)…..(1-P)
1-xn
x1+x2+x3….+xn
n – (x1+x2…..+xn)
(1-P) n
n
∑ Xi
=P
i=1
∑ Xi
(1-P)
n - i=1
1-xn
76. Analytically maximum likelihood
can also be found…
By finding the derivative with respect to
P and finding where the slope is 0.
2 log Λ
http://www.ics.uci.edu/~smyth/courses/cs274/papers/MLtutorial.pdf
77. Recap…
Get the population type – set the equation.
Write the loglikelihood function
Differentiate
Set the value of differentiation 0.
Solve the equation to estimate the parameter.
78. Methods of moments
Oldest method
Distribution dependent
Geometric
Poisson
Bernoulii….
Depends upon PDF
79. Methods of Moment
Population moments can be determined by sample
moments.
Can be robust
Sample mean can determine population mean and
sample variance can determine population variance.
Does Not work well when the distribution is
exponential.