O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Mais Conteúdo rRelacionado

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Chi square

  1. 1. CHI SQUARE Dr.C.Hemamalini Assistant Professor Department of Economics Ethiraj College for women Chennai- 600 008
  2. 2. Introduction The Chi-square test is one of the most commonly used non-parametric test. It was introduced by Karl Pearson as a test of association. The Greek Letter χ2 is used to denote this test. The chi-squared distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is determined by the degrees of freedom. The simplest chi-squared distribution is the square of a standard normal distribution. The chi-squared distribution is used primarily in hypothesis testing. It can be applied on categorical data or qualitative data using a contingency table. Used to evaluate unpaired/unrelated samples and proportions.
  3. 3. 3  .It is a mathematical expression, representing the ratio between experimentally obtained result (O) and the theoretically expected result (E) based on certain hypothesis.  It uses data in the form of frequencies (i.e., the number of occurrence of an event).  Chi-square test is calculated by dividing the square of the overall deviation in the observed and expected frequencies by the expected frequency.
  4. 4. 4 Degrees of Freedom The number of independent pieces of information which are free to vary, that go into the estimate of a parameter is called the degrees of freedom. The degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself The number of degrees of freedom for ‘n’ observations is ‘n-k’ and is usually denoted by ‘ν ’, where ‘k’ is the number of independent linear constraints imposed upon them.
  5. 5. 5 Chi Square Distribution The mean of the distribution is equal to the number of degrees of freedom: μ = v. The variance is equal to two times the number of degrees of freedom: σ2 = 2 * v When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when Χ2 = v - 2. As the degrees of freedom increase, the chi-square curve approaches a normal distribution.
  6. 6. 6 .If there are two classes, three classes, and four classes, the degree of freedom would be 2-1, 3-1, and 4-1. . In a contingency table, the degree of freedom is calculated in a different manner: d.f. = (r-1) (c-1) where- r = number of row in a table, c = number of column in a table. Thus in a 2×2 contingency table, the degree of freedom is (2-1 ) (2- 1) = 1. Similarly, in a 3×3 contingency table, the number of degree of
  7. 7. 7 Characteristics of Chi Square  This test is based on frequencies and not on the parameters like mean and standard deviation.  The test is used for testing the hypothesis and is not useful for estimation.  This test possesses the additive property as has already been explained.  This test can also be applied to a complex contingency table with several classes and as such is a very useful test in research work.  This test is an important non-parametric test as no rigid assumptions are necessary in regard to the type of population, no need of parameter values and relatively less mathematical details are involved.
  8. 8. 8 Conditions for applying the Chi-Square test 1. The frequencies used in Chi-Square test must be absolute and not in relative terms. 2. The total number of observations collected for this test must be large. 3. Each of the observations which make up the sample of this test must be independent of each other. 4. As λ 2 test is based wholly on sample data, no assumption is made concerning the population distribution. 5.Expected values greater than 5 in 80% or more of the cells. 6.Moreover, if number of cells is fewer than 5, then all expected values must be greaterthan 5.
  9. 9. 9 Steps Required  Identify the problem  Make a contingency table and note the observed frequency (O) is each classes of one event, row wise i.e. horizontally and then the numbers in each group of the other event, column wise i.e. vertically.  Set up the Null hypothesis (Ho); According to Null hypothesis, no association exists between attributes. This need s setting up of alternative hypothesis (HA).  Calculate the expected frequencies (E).  Find the difference between observed and Expected frequency in each cell (O-E). 6. Calculate the chi-square value applying the formula. The value is ranges from zero to Infinite.          E EO 2 )( 2
  10. 10. 10 Uses of Chi Square Test In the test for independence, the null hypothesis is that the row and columnvariables are independent of each other. We have studied earlier, that the hypothesistesting is done under the assumption that the null hypothesis is true Test of goodness of fitThe test of goodness of fit of a statistical model measures how accurately the testfits a set of observations Tests for independence of attributes Test of goodness of fit
  11. 11. 11 Steps in Testing Goodness of fit  A Null and Alternative hypothesis established and a significance level is selected for rejection of null hypothesis.  A random sample of observations is drawn from a relevant statistical population.  A set of expected frequencies is derived under the assumption that the null hypothesis is true.  The observed frequencies compared with the expected frequencies  The calculated value of Chi-Square goodness of fit test is compared with the table value. If the calculated value of Chi-Square goodness of fit test is greater than the table value, we will reject the null hypothesis and conclude that there is a significant difference between the observed and the expected frequency.
  12. 12. A certain drug is claimed to be effective in curing cold . in an experiment on 500 persons with cold. half of them were given the drug and half of them were given the sugar pills. the patients reactions to the treatment are recorded in the following table. on the basis of the data can it be concluded that there is significant difference in the effect of the drug and sugar pills? Helped Harmed No Effect Total Drug 150 30 70 250 Sugar Pills 130 40 80 250 Total 280 70 150 500
  13. 13. H0:THere is no significant difference in the effect of the drug and Sugar pills. Expected Frequency = RT (CT) GT 140 35 75 250 140 35 75 250 280 70 150 500
  14. 14. =3.522 V= (r-1)(c-1) = (2-1)(3-1)=2 v=2 x2 0.05 = 5.99 The calculated value of Chi Square is less than the table value. Hence the hypothesis is accepted. There is no significant difference in the effect of the drug and sugar pills. O E (O-E)2 (O-E)2/E 150 140 100 0.714 130 140 100 0.714 30 35 25 0.714 40 35 25 0.714 70 75 25 0.333 80 75 25 0.333 3.522          E EO 2 )( 2
  15. 15. 16 Limitations A reasonably strong association may not come up as significant if the sample size is small, and conversely, in large samples. chi-square is highly sensitive to sample size. As sample size increases, absolute differences become a smaller and smaller proportion of the expected value. Chi-square is also sensitive to small frequencies in the cells of tables. 01 02 03 04 we may find statistical significance when the findings are small and uninteresting., i.e., the findings are not substantively significant, although they are statistically significant.
  16. 16. 17 Conclusions 80% 50% 10% 30% 70% 50% 20% 60% The rule of thumb here is that if either (i) an expected value in a cell is less than 5 or (ii) more than 20% of the expected values in cells are less than 5, then chi- square should not and usually is not computed.
  17. 17. Reference Books Statistical Methods -S.P Gupta Statistics - R.S.N Pillai and V.Bagavathi
  18. 18. THANK YOU