SlideShare uma empresa Scribd logo
1 de 79
Sucheta Tripathy, IICB, November – December 2013
Chi square test
 Sucheta Tripathy, Biostatistics course

work IICB,Nov, 2013
Definitions
•Model or Hypothesis
•Null Hypothesis
•There is no significant difference between the 2.
TRU
E
FALS
E

•Goodness of fit
What you need
 A probability value
 Degree of freedom
 A contingent table
Determine if the deviation is due to chance

Accept

10%
Reject
Chi Square Test
Example 1
 Mendellian law of dominance

Aa X Aa

A
a

A -> Tall (Dominant)
a -> Dwarf
(recessive)
Aa is …….

a X A

AA Aa Aa aa

639 Tall and
281 dwarf
Chi square requires that you have numeric
values
Chi square should not be calculated if the
expected value is less than 5
Choosing a Test
 First Check if there is a Hypothesis to check
 If yes, then decide which one
 If No, then there is NO statistical test for that.


What is there.

Parametric tests have data that comes in a standard probability distribution.
Non-parametric studies can be used for both normally and not-normally
distributed data:
Question: Then why not to use them always?
Parametric tests make a lot of assumptions: If the assumptions are correct, the
results are more accurate.
Choosing a Test
 First Check if there is a Hypothesis to check
 If yes, then decide which one
 If No, then there is NO statistical test for that.


What is there.

Parametric tests have data that comes in a standard probability distribution.
Non-parametric studies can be used for both normally and not-normally
distributed data:
Question: Then why not to use them always?
Parametric tests make a lot of assumptions: If the assumptions are correct, the
results are more accurate.
Example
9:3:3:1
Example
Number of sixes
Rolls
0
1
2
3

Number of
48
35
15
03

p1 = P(roll 0 sixes) = P(X=0) = 0.58 P(k out of n) =
p2 = P(roll 1 six) = P(X=1) = 0.345
p3 = P(roll 2 sixes) = P(X=2) = 0.07
p4 = P(roll 3 sixes) = P(X=3) =
0.005

n!
k!(n-k)!

http://www.mathsisfun.com/data/binomial-distribution.html

pk(1-p)(n-k)
Parametric
• Two samples –
compare mean value
for some variable of
interest

Nonparametric

t-test for
independent
samples

Wald-Wolfowitz
runs test
Mann-Whitney
U test
KolmogorovSmirnov two
sample test
• Compare two variables
measured in the same
sample

Parametric
t-test for
dependent
samples

• If more than two variables Repeated
are measured in same
measures
sample
ANOVA

Nonparametric
Sign test
Wilcoxon’s
matched pairs
test
Friedman’s two
way analysis of
variance
Cochran Q
Null Hypothesis
 Coined by English Geneticist Ronald Fischer in 1935.
 At a given probability can either be true or false

Comparing
Populations/datasets:
Population A and Population B
Null hypothesis is true -> No
significant difference between the
populations.
Null Hypothesis is false ->
Significant difference between
populations

There are formula to calculate the
value of a population comparison:
There are look up tables with
values
If calculated value is less than
look up value -> Null hypothesis is
False else True
Null Hypothesis Testing
You have a doubt that whenever it rains your experiment
fails?!!!
NULL Hypothesis is True: No significant difference (Your
experiment failing and raining) -> Rain has nothing to do with
your experiment failing

NULL Hypothesis is false: There is a significant damage to
your experiments when it rains -> Rain ruins your experiment!!

Record when your experiment fails and check if it rains during
that time. It may be so that it happens by chance or it may be so
that there indeed is a relationship.
Lab study vs statistics research
 http://www.youtube.com/watch?feature=player_embe

dded&v=PbODigCZqL8
T-test
 The t-statistic was introduced in 1908 by William Sealy

Gosset
 Used in a normally distributed population

http://www.socialresearchmethods.net/kb/stat_t.php
T-test

Sqrt((Sum(D^2) – (sum(D))^2/n)/n-1)
Why Standardize??
ANOVA: F statistics
 Analysis of variance
One Way
Two way

2

2

Between Groups
Within groups

Cancel out between variation with
group variation
So How big is F?
Since F is
Mean Square Between / Mean Square Within
= MSG / MSE

A large value of F indicates relatively more
difference between groups than within groups
(evidence against H0)

To get the P-value, we compare to F(I-1,n-I)-distribution
• I-1 degrees of freedom in numerator (# groups -1)
• n - I degrees of freedom in denominator (rest of df)
Connections between SST, MST, and
standard deviation
If ignore the groups for a moment and just compute the standard deviation of
the entire data set, we see

s

2

x ij
n 1

x

2

SST
DFT

MST

So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL
variation in the data set.

SST: Sum of squares of Treatment
MST: Mean square of treatment
DFT: Degree of freedom of treatment
Connections between SSE, MSE, and
standard deviation
Remember:

si

xij

xi

ni

2

1

2

SS[ Within Group i ]
dfi

So SS[Within Group i] = (si2) (df i )

This means that we can compute SSE from the standard deviations and sizes
(df) of each group:

SSE

SS[Within]
2
i

s ( ni

1)

SS[Within Group i ]
2
i

s (dfi )
Computing ANOVA F statistic
data
group
5.3
1
6.0
1
6.7
1
5.5
2
6.2
2
6.4
2
5.7
2
7.5
3
7.2
3
7.9
3
TOTAL
TOTAL/df

group
mean
6.00
6.00
6.00
5.95
5.95
5.95
5.95
7.53
7.53
7.53

WITHIN
difference:
data - group mean
plain
squared
-0.70
0.490
0.00
0.000
0.70
0.490
-0.45
0.203
0.25
0.063
0.45
0.203
-0.25
0.063
-0.03
0.001
-0.33
0.109
0.37
0.137
1.757
0.25095714

overall mean: 6.44

BETWEEN
difference
group mean - overall mean
plain
squared
-0.4
0.194
-0.4
0.194
-0.4
0.194
-0.5
0.240
-0.5
0.240
-0.5
0.240
-0.5
0.240
1.1
1.188
1.1
1.188
1.1
1.188
5.106
2.55275

F = 2.5528/0.25025 = 10.21575
Validation
 The larger the F value ->

More variation  reject null hypothesis
A

X-x

squ
are

B

X-x

squ
are

C

1

62

72

42

2

81

49

52

3

75

63

31

4

58

68

80

5

67

39

22

6

48

79

71

7

26

40

68

8

36

76

9

45

Mea
n
TMS
TO
MS
MST(between)
and MSE (within)

df1 and df2

X-x

squ
are
In Summary
SST

(x ij

x)

2

2

s (DFT)

obs

SSE

(x ij

xi )

obs

SSG

2

2

si (df i )
groups

(x i
obs

SSE SSG

x)

2

n i (x i

x)

2

groups

SST; MS

SS
; F
DF

MSG
MSE
2
R

Statistic

R2 gives the percent of variance due to between
group variation

R

2

SS[Between ]
SS[Total ]

We will see R2 again when we study regression.

SSG
SST
Where’s the Difference?
Once ANOVA indicates that the groups do not all appear to have the same means,
what do we do?

Analysis of Variance for days
Source
DF
SS
MS
treatmen
2
34.74
17.37
Error
22
59.26
2.69
Total
24
94.00

Level
A
B
P

N
8
8
9

Pooled StDev =

Mean
7.250
8.875
10.111
1.641

StDev
1.669
1.458
1.764

F
6.45

P
0.006

Individual 95% CIs For Mean
Based on Pooled StDev
----------+---------+---------+-----(-------*-------)
(-------*-------)
(------*-------)
----------+---------+---------+-----7.5
9.0
10.5

Clearest difference: P is worse than A (CI’s don’t overlap)
Multiple Comparisons
Once ANOVA indicates that the groups do not all
have the same means, we can compare them two
by two using the 2-sample t test
• We

need to adjust our p-value threshold because we are
doing multiple tests with the same data.
•There are several methods for doing this.
• If we really just want to test the difference between one
pair of treatments, we should set the study up that way.
Tuckey’s Pairwise
Comparisons

Tukey's pairwise comparisons

Family error rate = 0.0500
Individual error rate = 0.0199

95% confidence
Use alpha = 0.0199 for
each test.

Critical value = 3.55

Intervals for (column level mean) - (row level mean)
A
B

-4.863
-0.859

These give 98.01%
CI’s for each pairwise
difference.

-3.685
0.435

P

B

-3.238
0.766

98% CI for A-P is (-0.86,-4.86)

Only P vs A is significant
(both values have same sign)
Tukey’s Method in R
Tukey multiple comparisons of means
95% family-wise confidence level

diff
lwr
upr
B-A 1.6250 -0.43650 3.6865
P-A 2.8611 0.85769 4.8645
P-B 1.2361 -0.76731 3.2395
Independent sample t-test
Number of words recalled
df = (n1-1) + (n2-1) = 18

t

x1 x2
s x1 x2

t ( 0.05,18)
t

19 26
1
2.101

t ( 0.05,18)
 Reject H0

7
T test
 One sample t test
 Unpaired and paired t test
 Same set of subjects over a period of time

 Independent sets of subjects over a period of time.

http://www.youtube.com/watch?v=JlfLnx8sh-o

One tailed and two tailed t-test:
One tailed: Average height of class A is greater than class B
Two tailed: Average height of class A is different from class B
Z- test statistics
 Sample size is large
 Population variance is known

Sample size is small population variance is unknown go
for t-test
Calculation of z value
Z = X - µ / sqrt (variance/n)
Suppose that in a particular geographic region, the mean and standard deviation
of scores on a reading test are 100 points, and 12 points, respectively. Our interest
is in the scores of 55 students in a particular school who received a mean score of
96. We can ask whether this mean score is significantly lower than the regional
mean — that is, are the students in this school comparable to a simple random
sample of 55 students from the region as a whole, or are their scores surprisingly
low?
We begin by calculating the standard error of the mean:
F-tests / Analysis of Variance (ANOVA)
t=

obtained difference between sample means
difference expected by chance (error)

variance (differences) between sample means

F=

variance (differences) expected by chance (error)

Difference between sample means is easy for 2 samples:

(e.g. X1=20, X2=30, difference =10)
but if X3=35 the concept of differences between sample
means gets tricky
F-tests / Analysis of Variance (ANOVA)
Simple ANOVA example
Total variability

Between treatments
variance

Within treatments
variance

----------------------------

--------------------------

Measures differences due to:

Measures differences due to:

1.

Treatment effects

1. Chance

2.

Chance
F-tests / Analysis of Variance (ANOVA)

F=

MSbetween

When treatment has no effect, differences
between groups/treatments are entirely due
to chance. Numerator and denominator will
be similar. F-ratio should have value around
1.00

MSwithin
When the treatment does have an effect then
the between-treatment differences
(numerator) should be larger than chance
(denominator). F-ratio should be noticeably
larger than 1.00
F-tests / Analysis of Variance (ANOVA)
Simple independent samples ANOVA example

F(3, 8) = 9.00, p<0.05
Placebo Drug A

Drug B

Drug C

Mean

1.0

1.0

4.0

6.0

SD

1.73

1.0

1.0

1.73

n

3

3

3

3

There is a difference
somewhere - have to use
post-hoc tests (essentially
t-tests corrected for multiple
comparisons) to examine
further
F Test Anova
 http://www.youtube.com/watch?v=-yQb_ZJnFXw
Non Parametric tests
Non-parametric tests are basically used in order to overcome the underlying
assumption of normality in parametric tests. Quite general assumptions
regarding the population are used in these tests
Read more: Mann-Whitney U-test / Mann-Whitney-Wilcoxon

IT DOES NOT ASSUME THE VARIANCES TO BE EQUAL!!
Mann-Whitney-Wilcoxon (MWW)
or Wilcoxon Rank-Sum Test)
German Gustav Deuchler in 1914 (with a missing term in the variance) and
later independently by Frank Wilcoxon in 1945
This test is based on the idea that the particular pattern exhibited when 'm'
number of X random variables and 'n' number of Y random variables are arranged
together in increasing order of magnitude provides information about the
relationship between their parent populations.

Assumptions:
•Two samples are random and are independent of each other
•Observations are numeric and ordinal(Arranged in ranks)

It is a test of comparison of medians
When to use this?
When to use this?
Test of Normality:
Simple Histogram method

Normal Probability plot
How to Construct a normal probability plot
Data

rank

i

20

1

15

2

26

3

32

4

18

5

28

6

35

7

14

8

26

9

22

10

17

11

i-.5/N(X) Z
Theoritic value(Y)
al value
observed
value
Mean: 38.8

Sd= 11.4
Ranking the values
4.5
5
5.5
6
6
27

A
5
6
7
8
9
N1=5

B
2
1
5
7
3
4
N2=6

Total number of
comparison= 5 X 6
= 30

0
0
0.5
2.5
0
0
3

How to rank?
Less =0
Tie = 0.5
More = 1
Step by step
 Rank the values
 Add the ranks
 Select larger of the two ranks.
 Calculate N1, N2 and Nx and Tx (Nx – Number of

people with higher rank, rank total of larger value)
 Calculate U

U = N1 * N2 + Nx * (Nx + 1)/Tx - Tx
Less is the value -> Reject Null Hypothesis
Calculating U value
 For smaller dataset: U is the count of ranks of smaller

dataset.
 For larger dataset:

U1 = R1 – n1(n1+1)/2
U2 = R2 – n2(n2+1)/2
Kruskal-Wallis test (H Test)
 Non-parametric test
 Equivalent to Anova (F test) in parametric test
 Does not require the distribution to be normal
 Distribution need to be independent
 Used more often when the distribution is un-equal.
 Data is ordinal
Assumes the distribution to have the same shape:
1. If one distribution is skewed to left and other to the right (un-equal variance),
this test will give in-accurate result
Kruskal Wallis Test
Group A

Group B

Group C

27

20

34

2

8

31

4

14

3

18

36

23

7

21

30

9

22

6
Kruskal Wallis Test
 Define Null or alternative Hypothesis
 State probability
 Calculate Degree of Freedom
 Find critical value

 Calculate the test hypothesis
 State result

H0 Accept NULL hypothesis: There is no difference between the samples
H1 Reject NULL hypothesis : There is difference between the samples
> Critical value reject null hypothesis
Rank

Value

Group A

Group B

Group C

1

2

27

20

34

2

3

2

8

31

3

4

4

14

3

4

6

18

36

23

5

7

7

21

30

6

8

9

22

6

7

9

8

14

9

18

Group A

Group B

Group C

14

10

17

10

6

16

11

21

3

8

2

12

22

9

18

13

13

23

5

11

15

14

27

7

12

4

15

30

Total R 39

65

67

16

31

n

6

6

17

34

18

36

Σ

2
Ri
n

20

1

H= 12
N(N+1)

6

-

3 (N+1)

12/18(19) X (39^2/6 +
65^2/6 + 67^2/6)
- 3(18+1) =2.854

Critical value
Reject NULL
hypothesis
(5.99)
Kolmogorov Smirnov test(KS)
 Non-parametric
 Distribution is unknown
 One way and Two way
 One way – Checks the goodness of fit
 Two way - compare the distribution

Goodness of Fit: A Hypothesis (Mendel’s law of dominance)
NULL Hypothesis:
H0 : F(x) = F*(x) for all x
H1: F(x) = F*(x) for at least one value of x
The K-S statistic Dn is defined as:

K-S test

Dn = max [ | Fn(x) - F(x) | ]
where

Dn is known as the K-S distance
n = total number of data points
F(x) = distribution function of the
fitted distribution
Fn(x) = i/n
i = the cumulative rank of the data
point
Kolmogorov Smirnov test(KS)
Group 1

Group 2

Not confident

20

4

Slightly confident

30

27

Somewhat
confident

13

28

Confident

20

18

Very confident

41

47

1. Take Total
2. Find Frequency
3. Calculate cumulative frequency
4. Find difference
5. Get the largest difference
6. Find critical value (1.36/sqrt(sample size))
7. Test goodness of fit
e.g; Our D > crit D (Distribution is unequal) -> reject NULL Hypothesis
Group 1 Freq

Cumul
ative
freq

Group 2

Freq

Cumul
ative
frq

D

Not
confide
nt

20

0.1612

0.1612

4

0.0322

0.0322

0.129

Slightly
confide
nt

30

0.2419

0.403

27

0.2177

0.25

0.153

Somew
hat
confide
nt

13

0.104

0.508

28

0.225

0.47

0.032

Confide
nt

20

0.161

0.669

18

0.145

0.62

0.048

Very
confide
nt

41

0.330

1

47

0.379

1

0

Critical D = 1.36/sqrt(n1+n2/n1*n2) =
Test NULL Hypothesis
Kolmogorov Smirnov test(KS)
Group 1

Group 2

Not confident

20

4

Slightly confident

30

27

Somewhat
confident

13

28

Confident

20

18

Very confident

41

47

1:2:3:2:1

1. Take Total
2. Find Frequency
3. Calculate cumulative frequency
4. Find difference
5. Get the largest difference
6. Find critical value (1.36/sqrt(sample size))
7. Test goodness of fit
e.g; Our D > crit D (Distribution is unequal) -> reject NULL Hypothesis
Methods of Estimation
 Methods of moments
 Maximum likelihood
 Bayesian Estimators
 Markov chain monte carlo…
Why??
Population size is too large
Testing a hypothesis on a set of samples
Probability density function
(pdf ) -> For continuous variables
Probability mass function (pmf)
-> For discrete variables

Parameter space
Set of all
Family of pdf/pmf

Estimator T is unbiased: if the sample
parameter is ……. Population parameter
Probability density function
Estimation Methods
 Data gets 2 or multi dimensional…..
Method of maximum likelihood
 The maximum likelihood estimates of a distribution

type are the values of its parameters that produce the
maximum joint probability density or mass for the
observed data X given the chosen probability model.

Maximum likelihood is more
general, can be applied on any
probability distribution.
The MLE
 Best parameters obtained by maximizing the

probability of the observed samples.
 Has good convergence properties as sample sizes
increase: Estimated value may equal real value with
Large N
 Applications are many: From speech recognition to
natural language processing to computational biology.
Simple MLE: Coin tossing
 Toss a coin:
 Head
 Tail
Flip coin 10 times (n) = H, H, H, T, H, T, T, T, H, T => 1, 1, 1,
0, 1, 0, 0, 0, 1, 0
An appropriate model for getting a head in a single flip is:
If P = 0.6 and
Xi = 0 and Xi =1
The maximum likelihood
Example:
We want to estimate the probability, p, that individuals are
infected with a certain kind of parasite.
Ind.:

Probability of
Infected: observation:

1

1

p

2

0

1-p

3

1

p

4

1

p

5

0

1-p

6

1

p

7

1

p

8

0

1-p

9

0

1-p

10

1

p

The maximum likelihood method
(discrete distribution):
1. Write down the probability of
each observation by using the
model parameters
2. Write down the probability of
all the data

Pr( Data | p)

p 6 (1 p) 4

3. Find the value parameter(s)
that maximize this probability
The maximum likelihood
Example:
We want to estimate the probability, p, that individuals are
infected with a certain kind of parasite.

0

1-p

3

1

p

4

1

p

5

0

1-p

6

1

p

7

1

p

8

0

1-p

9

0

1-p

10

1

p

Pr( Data | p)

p 6 (1 p) 4

- Find the value parameter(s) that
maximize this probability
0.0012

2

L( p )

0.0008

p

0.0004

1

L(p, K, N)

1

Likelihood function:

0.0000

Ind.:

Probability of
Infected: observation:

0.0

0.2

0.4

0.6
p

0.8

1.0
Brute Force…
Likelihood Function
x1

n

 L(P|X1…..Xn) = Π F(xi|P)
i=1

1-x1

=P (1-P)
x1 x2

x2

1-x2

xn

1-x1

= P P …P (1-P)
=P

xn

P (1-P) ………P (1-P)
1-x2

(1-P)…..(1-P)

1-xn

x1+x2+x3….+xn
n – (x1+x2…..+xn)
(1-P) n
n

∑ Xi

=P

i=1

∑ Xi

(1-P)

n - i=1

1-xn
Analytically maximum likelihood
can also be found…

By finding the derivative with respect to
P and finding where the slope is 0.

2 log Λ

http://www.ics.uci.edu/~smyth/courses/cs274/papers/MLtutorial.pdf
Recap…
 Get the population type – set the equation.
 Write the loglikelihood function
 Differentiate
 Set the value of differentiation 0.

 Solve the equation to estimate the parameter.
Methods of moments
 Oldest method
 Distribution dependent
 Geometric
 Poisson
 Bernoulii….
 Depends upon PDF
Methods of Moment
 Population moments can be determined by sample

moments.
 Can be robust
 Sample mean can determine population mean and

sample variance can determine population variance.
 Does Not work well when the distribution is
exponential.

Mais conteúdo relacionado

Mais procurados

Quantitative method compare means test (independent and paired)
Quantitative method compare means test (independent and paired)Quantitative method compare means test (independent and paired)
Quantitative method compare means test (independent and paired)Keiko Ono
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Day 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfDay 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfElih Sutisna Yanto
 
Introduction to the t test
Introduction to the t testIntroduction to the t test
Introduction to the t testSr Edith Bogue
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of varianceJudianto Nugroho
 
9. basic concepts_of_one_way_analysis_of_variance_(anova)
9. basic concepts_of_one_way_analysis_of_variance_(anova)9. basic concepts_of_one_way_analysis_of_variance_(anova)
9. basic concepts_of_one_way_analysis_of_variance_(anova)Irfan Hussain
 
Multiplicity, how to deal with the testing of more than one hypothesis.
Multiplicity, how to deal with the testing of more than one hypothesis.Multiplicity, how to deal with the testing of more than one hypothesis.
Multiplicity, how to deal with the testing of more than one hypothesis.Gaetan Lion
 
Statistics and probability
Statistics and probabilityStatistics and probability
Statistics and probabilityShahwarKhan16
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsLong Beach City College
 
The Kruskal-Wallis H Test
The Kruskal-Wallis H TestThe Kruskal-Wallis H Test
The Kruskal-Wallis H TestDr. Ankit Gaur
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreGiridhar Chandrasekaran
 
Anova single factor
Anova single factorAnova single factor
Anova single factorDhruv Patel
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminHazilahMohd
 
parametric test of difference z test f test one-way_two-way_anova
parametric test of difference z test f test one-way_two-way_anova parametric test of difference z test f test one-way_two-way_anova
parametric test of difference z test f test one-way_two-way_anova Tess Anoza
 
F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)Marianne Maluyo
 
One Sample T Test
One Sample T TestOne Sample T Test
One Sample T Testshoffma5
 

Mais procurados (20)

Quantitative method compare means test (independent and paired)
Quantitative method compare means test (independent and paired)Quantitative method compare means test (independent and paired)
Quantitative method compare means test (independent and paired)
 
Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Day 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfDay 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdf
 
Probablity
ProbablityProbablity
Probablity
 
Introduction to the t test
Introduction to the t testIntroduction to the t test
Introduction to the t test
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of variance
 
9. basic concepts_of_one_way_analysis_of_variance_(anova)
9. basic concepts_of_one_way_analysis_of_variance_(anova)9. basic concepts_of_one_way_analysis_of_variance_(anova)
9. basic concepts_of_one_way_analysis_of_variance_(anova)
 
Multiplicity, how to deal with the testing of more than one hypothesis.
Multiplicity, how to deal with the testing of more than one hypothesis.Multiplicity, how to deal with the testing of more than one hypothesis.
Multiplicity, how to deal with the testing of more than one hypothesis.
 
Statistics and probability
Statistics and probabilityStatistics and probability
Statistics and probability
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and Boxplots
 
The Kruskal-Wallis H Test
The Kruskal-Wallis H TestThe Kruskal-Wallis H Test
The Kruskal-Wallis H Test
 
T-Test
T-TestT-Test
T-Test
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - Core
 
Central Tendency & Dispersion
Central Tendency & DispersionCentral Tendency & Dispersion
Central Tendency & Dispersion
 
Anova single factor
Anova single factorAnova single factor
Anova single factor
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd Amin
 
parametric test of difference z test f test one-way_two-way_anova
parametric test of difference z test f test one-way_two-way_anova parametric test of difference z test f test one-way_two-way_anova
parametric test of difference z test f test one-way_two-way_anova
 
F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)F test Analysis of Variance (ANOVA)
F test Analysis of Variance (ANOVA)
 
One Sample T Test
One Sample T TestOne Sample T Test
One Sample T Test
 

Destaque (8)

Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
 
Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
Gal
GalGal
Gal
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 

Semelhante a Stat2013

Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Sneh Kumari
 
2.0.statistical methods and determination of sample size
2.0.statistical methods and determination of sample size2.0.statistical methods and determination of sample size
2.0.statistical methods and determination of sample sizesalummkata1
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981
 
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdfDr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdfHassanMohyUdDin2
 
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOIANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOIprasad439227
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
Analysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsAnalysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsBabasab Patil
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-testChristina K J
 
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhgjhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhgUMAIRASHFAQ20
 
t Test- Thiyagu
t Test- Thiyagut Test- Thiyagu
t Test- ThiyaguThiyagu K
 

Semelhante a Stat2013 (20)

Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
2.0.statistical methods and determination of sample size
2.0.statistical methods and determination of sample size2.0.statistical methods and determination of sample size
2.0.statistical methods and determination of sample size
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
 
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdfDr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOIANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
ANOVAs01.ppt KHLUGYIFTFYLYUGUH;OUYYUHJLNOI
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVAs01.ppt
ANOVAs01.pptANOVAs01.ppt
ANOVAs01.ppt
 
ANOVA.ppt
ANOVA.pptANOVA.ppt
ANOVA.ppt
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Medical statistics2
Medical statistics2Medical statistics2
Medical statistics2
 
anova.ppt
anova.pptanova.ppt
anova.ppt
 
Analysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsAnalysis of variance ppt @ bec doms
Analysis of variance ppt @ bec doms
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Anova.ppt
Anova.pptAnova.ppt
Anova.ppt
 
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhgjhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
 
t Test- Thiyagu
t Test- Thiyagut Test- Thiyagu
t Test- Thiyagu
 
non para.doc
non para.docnon para.doc
non para.doc
 

Mais de Sucheta Tripathy (20)

Stat2013
Stat2013Stat2013
Stat2013
 
Presentation2013
Presentation2013Presentation2013
Presentation2013
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
 
Vbi oomycetes2011 final
Vbi oomycetes2011 finalVbi oomycetes2011 final
Vbi oomycetes2011 final
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
6th July 2012
6th July 20126th July 2012
6th July 2012
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
 

Último

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 

Último (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Stat2013

  • 1. Sucheta Tripathy, IICB, November – December 2013
  • 2. Chi square test  Sucheta Tripathy, Biostatistics course work IICB,Nov, 2013
  • 3. Definitions •Model or Hypothesis •Null Hypothesis •There is no significant difference between the 2. TRU E FALS E •Goodness of fit
  • 4. What you need  A probability value  Degree of freedom  A contingent table Determine if the deviation is due to chance Accept 10% Reject
  • 6. Example 1  Mendellian law of dominance Aa X Aa A a A -> Tall (Dominant) a -> Dwarf (recessive) Aa is ……. a X A AA Aa Aa aa 639 Tall and 281 dwarf Chi square requires that you have numeric values Chi square should not be calculated if the expected value is less than 5
  • 7. Choosing a Test  First Check if there is a Hypothesis to check  If yes, then decide which one  If No, then there is NO statistical test for that.  What is there. Parametric tests have data that comes in a standard probability distribution. Non-parametric studies can be used for both normally and not-normally distributed data: Question: Then why not to use them always? Parametric tests make a lot of assumptions: If the assumptions are correct, the results are more accurate.
  • 8. Choosing a Test  First Check if there is a Hypothesis to check  If yes, then decide which one  If No, then there is NO statistical test for that.  What is there. Parametric tests have data that comes in a standard probability distribution. Non-parametric studies can be used for both normally and not-normally distributed data: Question: Then why not to use them always? Parametric tests make a lot of assumptions: If the assumptions are correct, the results are more accurate.
  • 10. Example Number of sixes Rolls 0 1 2 3 Number of 48 35 15 03 p1 = P(roll 0 sixes) = P(X=0) = 0.58 P(k out of n) = p2 = P(roll 1 six) = P(X=1) = 0.345 p3 = P(roll 2 sixes) = P(X=2) = 0.07 p4 = P(roll 3 sixes) = P(X=3) = 0.005 n! k!(n-k)! http://www.mathsisfun.com/data/binomial-distribution.html pk(1-p)(n-k)
  • 11. Parametric • Two samples – compare mean value for some variable of interest Nonparametric t-test for independent samples Wald-Wolfowitz runs test Mann-Whitney U test KolmogorovSmirnov two sample test
  • 12. • Compare two variables measured in the same sample Parametric t-test for dependent samples • If more than two variables Repeated are measured in same measures sample ANOVA Nonparametric Sign test Wilcoxon’s matched pairs test Friedman’s two way analysis of variance Cochran Q
  • 13. Null Hypothesis  Coined by English Geneticist Ronald Fischer in 1935.  At a given probability can either be true or false Comparing Populations/datasets: Population A and Population B Null hypothesis is true -> No significant difference between the populations. Null Hypothesis is false -> Significant difference between populations There are formula to calculate the value of a population comparison: There are look up tables with values If calculated value is less than look up value -> Null hypothesis is False else True
  • 14. Null Hypothesis Testing You have a doubt that whenever it rains your experiment fails?!!! NULL Hypothesis is True: No significant difference (Your experiment failing and raining) -> Rain has nothing to do with your experiment failing NULL Hypothesis is false: There is a significant damage to your experiments when it rains -> Rain ruins your experiment!! Record when your experiment fails and check if it rains during that time. It may be so that it happens by chance or it may be so that there indeed is a relationship.
  • 15. Lab study vs statistics research  http://www.youtube.com/watch?feature=player_embe dded&v=PbODigCZqL8
  • 16. T-test  The t-statistic was introduced in 1908 by William Sealy Gosset  Used in a normally distributed population http://www.socialresearchmethods.net/kb/stat_t.php
  • 18.
  • 19.
  • 20.
  • 22. ANOVA: F statistics  Analysis of variance One Way Two way 2 2 Between Groups Within groups Cancel out between variation with group variation
  • 23. So How big is F? Since F is Mean Square Between / Mean Square Within = MSG / MSE A large value of F indicates relatively more difference between groups than within groups (evidence against H0) To get the P-value, we compare to F(I-1,n-I)-distribution • I-1 degrees of freedom in numerator (# groups -1) • n - I degrees of freedom in denominator (rest of df)
  • 24. Connections between SST, MST, and standard deviation If ignore the groups for a moment and just compute the standard deviation of the entire data set, we see s 2 x ij n 1 x 2 SST DFT MST So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL variation in the data set. SST: Sum of squares of Treatment MST: Mean square of treatment DFT: Degree of freedom of treatment
  • 25. Connections between SSE, MSE, and standard deviation Remember: si xij xi ni 2 1 2 SS[ Within Group i ] dfi So SS[Within Group i] = (si2) (df i ) This means that we can compute SSE from the standard deviations and sizes (df) of each group: SSE SS[Within] 2 i s ( ni 1) SS[Within Group i ] 2 i s (dfi )
  • 26.
  • 27. Computing ANOVA F statistic data group 5.3 1 6.0 1 6.7 1 5.5 2 6.2 2 6.4 2 5.7 2 7.5 3 7.2 3 7.9 3 TOTAL TOTAL/df group mean 6.00 6.00 6.00 5.95 5.95 5.95 5.95 7.53 7.53 7.53 WITHIN difference: data - group mean plain squared -0.70 0.490 0.00 0.000 0.70 0.490 -0.45 0.203 0.25 0.063 0.45 0.203 -0.25 0.063 -0.03 0.001 -0.33 0.109 0.37 0.137 1.757 0.25095714 overall mean: 6.44 BETWEEN difference group mean - overall mean plain squared -0.4 0.194 -0.4 0.194 -0.4 0.194 -0.5 0.240 -0.5 0.240 -0.5 0.240 -0.5 0.240 1.1 1.188 1.1 1.188 1.1 1.188 5.106 2.55275 F = 2.5528/0.25025 = 10.21575
  • 28. Validation  The larger the F value -> More variation  reject null hypothesis
  • 30. In Summary SST (x ij x) 2 2 s (DFT) obs SSE (x ij xi ) obs SSG 2 2 si (df i ) groups (x i obs SSE SSG x) 2 n i (x i x) 2 groups SST; MS SS ; F DF MSG MSE
  • 31. 2 R Statistic R2 gives the percent of variance due to between group variation R 2 SS[Between ] SS[Total ] We will see R2 again when we study regression. SSG SST
  • 32. Where’s the Difference? Once ANOVA indicates that the groups do not all appear to have the same means, what do we do? Analysis of Variance for days Source DF SS MS treatmen 2 34.74 17.37 Error 22 59.26 2.69 Total 24 94.00 Level A B P N 8 8 9 Pooled StDev = Mean 7.250 8.875 10.111 1.641 StDev 1.669 1.458 1.764 F 6.45 P 0.006 Individual 95% CIs For Mean Based on Pooled StDev ----------+---------+---------+-----(-------*-------) (-------*-------) (------*-------) ----------+---------+---------+-----7.5 9.0 10.5 Clearest difference: P is worse than A (CI’s don’t overlap)
  • 33. Multiple Comparisons Once ANOVA indicates that the groups do not all have the same means, we can compare them two by two using the 2-sample t test • We need to adjust our p-value threshold because we are doing multiple tests with the same data. •There are several methods for doing this. • If we really just want to test the difference between one pair of treatments, we should set the study up that way.
  • 34. Tuckey’s Pairwise Comparisons Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0199 95% confidence Use alpha = 0.0199 for each test. Critical value = 3.55 Intervals for (column level mean) - (row level mean) A B -4.863 -0.859 These give 98.01% CI’s for each pairwise difference. -3.685 0.435 P B -3.238 0.766 98% CI for A-P is (-0.86,-4.86) Only P vs A is significant (both values have same sign)
  • 35. Tukey’s Method in R Tukey multiple comparisons of means 95% family-wise confidence level diff lwr upr B-A 1.6250 -0.43650 3.6865 P-A 2.8611 0.85769 4.8645 P-B 1.2361 -0.76731 3.2395
  • 36. Independent sample t-test Number of words recalled df = (n1-1) + (n2-1) = 18 t x1 x2 s x1 x2 t ( 0.05,18) t 19 26 1 2.101 t ( 0.05,18)  Reject H0 7
  • 37. T test  One sample t test  Unpaired and paired t test  Same set of subjects over a period of time  Independent sets of subjects over a period of time. http://www.youtube.com/watch?v=JlfLnx8sh-o One tailed and two tailed t-test: One tailed: Average height of class A is greater than class B Two tailed: Average height of class A is different from class B
  • 38. Z- test statistics  Sample size is large  Population variance is known Sample size is small population variance is unknown go for t-test
  • 39. Calculation of z value Z = X - µ / sqrt (variance/n) Suppose that in a particular geographic region, the mean and standard deviation of scores on a reading test are 100 points, and 12 points, respectively. Our interest is in the scores of 55 students in a particular school who received a mean score of 96. We can ask whether this mean score is significantly lower than the regional mean — that is, are the students in this school comparable to a simple random sample of 55 students from the region as a whole, or are their scores surprisingly low? We begin by calculating the standard error of the mean:
  • 40. F-tests / Analysis of Variance (ANOVA) t= obtained difference between sample means difference expected by chance (error) variance (differences) between sample means F= variance (differences) expected by chance (error) Difference between sample means is easy for 2 samples: (e.g. X1=20, X2=30, difference =10) but if X3=35 the concept of differences between sample means gets tricky
  • 41. F-tests / Analysis of Variance (ANOVA) Simple ANOVA example Total variability Between treatments variance Within treatments variance ---------------------------- -------------------------- Measures differences due to: Measures differences due to: 1. Treatment effects 1. Chance 2. Chance
  • 42. F-tests / Analysis of Variance (ANOVA) F= MSbetween When treatment has no effect, differences between groups/treatments are entirely due to chance. Numerator and denominator will be similar. F-ratio should have value around 1.00 MSwithin When the treatment does have an effect then the between-treatment differences (numerator) should be larger than chance (denominator). F-ratio should be noticeably larger than 1.00
  • 43. F-tests / Analysis of Variance (ANOVA) Simple independent samples ANOVA example F(3, 8) = 9.00, p<0.05 Placebo Drug A Drug B Drug C Mean 1.0 1.0 4.0 6.0 SD 1.73 1.0 1.0 1.73 n 3 3 3 3 There is a difference somewhere - have to use post-hoc tests (essentially t-tests corrected for multiple comparisons) to examine further
  • 44. F Test Anova  http://www.youtube.com/watch?v=-yQb_ZJnFXw
  • 45. Non Parametric tests Non-parametric tests are basically used in order to overcome the underlying assumption of normality in parametric tests. Quite general assumptions regarding the population are used in these tests Read more: Mann-Whitney U-test / Mann-Whitney-Wilcoxon IT DOES NOT ASSUME THE VARIANCES TO BE EQUAL!!
  • 46. Mann-Whitney-Wilcoxon (MWW) or Wilcoxon Rank-Sum Test) German Gustav Deuchler in 1914 (with a missing term in the variance) and later independently by Frank Wilcoxon in 1945 This test is based on the idea that the particular pattern exhibited when 'm' number of X random variables and 'n' number of Y random variables are arranged together in increasing order of magnitude provides information about the relationship between their parent populations. Assumptions: •Two samples are random and are independent of each other •Observations are numeric and ordinal(Arranged in ranks) It is a test of comparison of medians
  • 47. When to use this?
  • 48. When to use this? Test of Normality: Simple Histogram method Normal Probability plot
  • 49. How to Construct a normal probability plot Data rank i 20 1 15 2 26 3 32 4 18 5 28 6 35 7 14 8 26 9 22 10 17 11 i-.5/N(X) Z Theoritic value(Y) al value observed value Mean: 38.8 Sd= 11.4
  • 50.
  • 51. Ranking the values 4.5 5 5.5 6 6 27 A 5 6 7 8 9 N1=5 B 2 1 5 7 3 4 N2=6 Total number of comparison= 5 X 6 = 30 0 0 0.5 2.5 0 0 3 How to rank? Less =0 Tie = 0.5 More = 1
  • 52. Step by step  Rank the values  Add the ranks  Select larger of the two ranks.  Calculate N1, N2 and Nx and Tx (Nx – Number of people with higher rank, rank total of larger value)  Calculate U U = N1 * N2 + Nx * (Nx + 1)/Tx - Tx
  • 53.
  • 54. Less is the value -> Reject Null Hypothesis
  • 55. Calculating U value  For smaller dataset: U is the count of ranks of smaller dataset.  For larger dataset: U1 = R1 – n1(n1+1)/2 U2 = R2 – n2(n2+1)/2
  • 56. Kruskal-Wallis test (H Test)  Non-parametric test  Equivalent to Anova (F test) in parametric test  Does not require the distribution to be normal  Distribution need to be independent  Used more often when the distribution is un-equal.  Data is ordinal Assumes the distribution to have the same shape: 1. If one distribution is skewed to left and other to the right (un-equal variance), this test will give in-accurate result
  • 57. Kruskal Wallis Test Group A Group B Group C 27 20 34 2 8 31 4 14 3 18 36 23 7 21 30 9 22 6
  • 58. Kruskal Wallis Test  Define Null or alternative Hypothesis  State probability  Calculate Degree of Freedom  Find critical value  Calculate the test hypothesis  State result H0 Accept NULL hypothesis: There is no difference between the samples H1 Reject NULL hypothesis : There is difference between the samples > Critical value reject null hypothesis
  • 59. Rank Value Group A Group B Group C 1 2 27 20 34 2 3 2 8 31 3 4 4 14 3 4 6 18 36 23 5 7 7 21 30 6 8 9 22 6 7 9 8 14 9 18 Group A Group B Group C 14 10 17 10 6 16 11 21 3 8 2 12 22 9 18 13 13 23 5 11 15 14 27 7 12 4 15 30 Total R 39 65 67 16 31 n 6 6 17 34 18 36 Σ 2 Ri n 20 1 H= 12 N(N+1) 6 - 3 (N+1) 12/18(19) X (39^2/6 + 65^2/6 + 67^2/6) - 3(18+1) =2.854 Critical value Reject NULL hypothesis (5.99)
  • 60. Kolmogorov Smirnov test(KS)  Non-parametric  Distribution is unknown  One way and Two way  One way – Checks the goodness of fit  Two way - compare the distribution Goodness of Fit: A Hypothesis (Mendel’s law of dominance) NULL Hypothesis: H0 : F(x) = F*(x) for all x H1: F(x) = F*(x) for at least one value of x
  • 61. The K-S statistic Dn is defined as: K-S test Dn = max [ | Fn(x) - F(x) | ] where Dn is known as the K-S distance n = total number of data points F(x) = distribution function of the fitted distribution Fn(x) = i/n i = the cumulative rank of the data point
  • 62. Kolmogorov Smirnov test(KS) Group 1 Group 2 Not confident 20 4 Slightly confident 30 27 Somewhat confident 13 28 Confident 20 18 Very confident 41 47 1. Take Total 2. Find Frequency 3. Calculate cumulative frequency 4. Find difference 5. Get the largest difference 6. Find critical value (1.36/sqrt(sample size)) 7. Test goodness of fit e.g; Our D > crit D (Distribution is unequal) -> reject NULL Hypothesis
  • 63. Group 1 Freq Cumul ative freq Group 2 Freq Cumul ative frq D Not confide nt 20 0.1612 0.1612 4 0.0322 0.0322 0.129 Slightly confide nt 30 0.2419 0.403 27 0.2177 0.25 0.153 Somew hat confide nt 13 0.104 0.508 28 0.225 0.47 0.032 Confide nt 20 0.161 0.669 18 0.145 0.62 0.048 Very confide nt 41 0.330 1 47 0.379 1 0 Critical D = 1.36/sqrt(n1+n2/n1*n2) = Test NULL Hypothesis
  • 64. Kolmogorov Smirnov test(KS) Group 1 Group 2 Not confident 20 4 Slightly confident 30 27 Somewhat confident 13 28 Confident 20 18 Very confident 41 47 1:2:3:2:1 1. Take Total 2. Find Frequency 3. Calculate cumulative frequency 4. Find difference 5. Get the largest difference 6. Find critical value (1.36/sqrt(sample size)) 7. Test goodness of fit e.g; Our D > crit D (Distribution is unequal) -> reject NULL Hypothesis
  • 65. Methods of Estimation  Methods of moments  Maximum likelihood  Bayesian Estimators  Markov chain monte carlo… Why?? Population size is too large Testing a hypothesis on a set of samples
  • 66. Probability density function (pdf ) -> For continuous variables Probability mass function (pmf) -> For discrete variables Parameter space Set of all Family of pdf/pmf Estimator T is unbiased: if the sample parameter is ……. Population parameter
  • 68. Estimation Methods  Data gets 2 or multi dimensional…..
  • 69. Method of maximum likelihood  The maximum likelihood estimates of a distribution type are the values of its parameters that produce the maximum joint probability density or mass for the observed data X given the chosen probability model. Maximum likelihood is more general, can be applied on any probability distribution.
  • 70. The MLE  Best parameters obtained by maximizing the probability of the observed samples.  Has good convergence properties as sample sizes increase: Estimated value may equal real value with Large N  Applications are many: From speech recognition to natural language processing to computational biology.
  • 71. Simple MLE: Coin tossing  Toss a coin:  Head  Tail Flip coin 10 times (n) = H, H, H, T, H, T, T, T, H, T => 1, 1, 1, 0, 1, 0, 0, 0, 1, 0 An appropriate model for getting a head in a single flip is: If P = 0.6 and Xi = 0 and Xi =1
  • 72. The maximum likelihood Example: We want to estimate the probability, p, that individuals are infected with a certain kind of parasite. Ind.: Probability of Infected: observation: 1 1 p 2 0 1-p 3 1 p 4 1 p 5 0 1-p 6 1 p 7 1 p 8 0 1-p 9 0 1-p 10 1 p The maximum likelihood method (discrete distribution): 1. Write down the probability of each observation by using the model parameters 2. Write down the probability of all the data Pr( Data | p) p 6 (1 p) 4 3. Find the value parameter(s) that maximize this probability
  • 73. The maximum likelihood Example: We want to estimate the probability, p, that individuals are infected with a certain kind of parasite. 0 1-p 3 1 p 4 1 p 5 0 1-p 6 1 p 7 1 p 8 0 1-p 9 0 1-p 10 1 p Pr( Data | p) p 6 (1 p) 4 - Find the value parameter(s) that maximize this probability 0.0012 2 L( p ) 0.0008 p 0.0004 1 L(p, K, N) 1 Likelihood function: 0.0000 Ind.: Probability of Infected: observation: 0.0 0.2 0.4 0.6 p 0.8 1.0
  • 75. Likelihood Function x1 n  L(P|X1…..Xn) = Π F(xi|P) i=1 1-x1 =P (1-P) x1 x2 x2 1-x2 xn 1-x1 = P P …P (1-P) =P xn P (1-P) ………P (1-P) 1-x2 (1-P)…..(1-P) 1-xn x1+x2+x3….+xn n – (x1+x2…..+xn) (1-P) n n ∑ Xi =P i=1 ∑ Xi (1-P) n - i=1 1-xn
  • 76. Analytically maximum likelihood can also be found… By finding the derivative with respect to P and finding where the slope is 0. 2 log Λ http://www.ics.uci.edu/~smyth/courses/cs274/papers/MLtutorial.pdf
  • 77. Recap…  Get the population type – set the equation.  Write the loglikelihood function  Differentiate  Set the value of differentiation 0.  Solve the equation to estimate the parameter.
  • 78. Methods of moments  Oldest method  Distribution dependent  Geometric  Poisson  Bernoulii….  Depends upon PDF
  • 79. Methods of Moment  Population moments can be determined by sample moments.  Can be robust  Sample mean can determine population mean and sample variance can determine population variance.  Does Not work well when the distribution is exponential.