SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Test of Significance:
                                  The Chi-square Statistic




                                                                          1
                                                                              1
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                         The Chi-square Statistic Learning
                                   Objectives

                To introduce the Chi-square statistic as a
                      test of statistical significance
                      To apply and interpret the calculated
                        Chi-square statistic for a practical
                      problem, using Chi-square tables and
                              ‘degrees of freedom’.


                                                                              2
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
• “ When it comes
                            to number of
                              babies, all
                             months are
                           equal but some
                             months are
                          more equal than
                               others.”
                               others.”




                                                                          3
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




               The Research Question = Research
                          Hypothesis
        • It is often thought that there are some ‘boom’
          months of their year when the number babies
          born is higher than others…
        • Can we, using data of babies who were born to
          hold a master’s degree, show this to be the
          case or not?
        • The research hypothesis is that there is a
          difference in the number of births from month
          to month.

                                                                          4
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
The Null Hypothesis

              • If there is nothing to the myth of boom
                months, then the distribution of numbers of
                births would be uniform throughout all the
                months of the year
              • Therefore the null hypothesis: there is no
                difference in the number of births from
                month to month




                                                                                                             5
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




   Range of Actual Births                                          What 'uniform' births   Differences
     Numbe     (observed                                             would look like          between
     rs        frequencies)                                          (expected                expected and
                                                                     frequencies)             observed
   Jan
   Feb
   Mar
   Apr
   May
   Jun
   Jul
   Aug
   Sep
   Oct
   Nov
   Dec
                                                                                                             6
   Total
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
How well do observed frequencies fit the
                              uniform model?
      • There are differences between the expected and
        observed frequencies. But these differences could be just
        because of the randomness of the data
      • Intuitively, we know that small differences between the
        observed and predicted frequencies represent a ‘good’ fit
      • So, overall, if we sum the differences, then a small sum
        of differences represents a good model
      • But positive and negative differences may cancel out
      • This is not so good

                                                                          7
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                  How well do observed frequencies fit the
                             uniform model?
      • So we square the differences between frequencies

      • Then we add squared differences up

      • A small sum of squares is good

      • To put the result into context, we divide each square
            difference by the respective expected frequency

      • The result is a measure of the goodness of our uniform
            random model



                                                                          8
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
(fo − fe )2
                                      χ2 = ∑
                                                                              fe

              This is the Chi-Square Statistic



                                                                                                            9
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




     Range of               Expected                              Differences between Difference squared
       Numb                   Frequencies                            expected and        divided by
       ers                                                           observed            expected
                                                                                         (contribution to
                                                                                         the chi-square)
     Jan
     Feb
     Mar
     Apr
     May
     Jun
     Jul
     Aug
     Sep
     Oct
     Nov
     Dec
     Total
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
                                                                                                            10
Key Properties of the Chi-Square
        • The Chi-Square is a non-parametric test: The value of
          the Chi-square statistics is not affected by the
          underlying statistical model that generates the data.
        • The value of the Chi-square depends only on the
          number of degrees of freedom, the higher the number
          of degrees of freedom, the higher the value of the chi-
          square should be.
        • The number of degrees of freedom is the number of
          different categories that contribute to the sum of the
          chi-square sum minus the number of pre-determined (or
          intermediate) parameters
                                                                                  11
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                     ‘Degrees of freedom’
        • In this example, degrees of freedom (d.f.) = k - 1, where k is the
          number of categories (months) that contribute to the chi-square. So
          d.f. = 12 – 1 = 11
        • Suppose that instead of using months we use seasons as our
          categories. Then we would only have four categories that would
          contribute to the chi-square. As such, would expect a SMALLER chi-
          square because there is a smaller number of contributions to the chi-
          square.
        • But why subtract by 1? Well the total number of births for all months
          is a predetermined value: it depends only on the sample size. If we
          know the frequencies for 11 of the 12 months, and we know the total
          number of births, then we can work out from these two numbers,
          what were the number of births in the 12th month. So therefore,
          although we have in total 12 months (12 categories), there in fact only
          11 ways (degrees of freedom) that the chi-square value can vary.
                                                                                  12
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Is the chi-square significant?
            υ             50             40             30            25     20      15      10      5       3
            5          4.351           5.132          6.064         6.626   7.289   8.115   9.236   11.07   12.83
            6          5.348           6.211          7.231         7.841   8.558   9.446   10.64   12.59   14.45
            7          6.346           7.283          8.383         9.037   9.803   10.75   12.02   14.07   16.01
            8          7.344           8.351          9.524         10.22   11.03   12.03   13.36   15.51   17.53
            9          8.343           9.414          10.66         11.39   12.24   13.29   14.68   16.92   19.02
           10          9.342           10.47          11.78         12.55   13.44   14.53   15.99   18.31   20.48
           11          10.34           11.53          12.90         13.70   14.63   15.77   17.28   19.68   21.92
           12          11.34           12.58          14.01         14.85   15.81   16.99   18.55   21.03   23.34




                                                                                                                    13
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                        Example: Goals in football

           • Hypothesis: the total number of goals
             scored in a game of football in Europe
             follows a Poisson Distribution with mean
             2.73




                                                                                                                    14
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
In Europe we observed this distribution of
                            football goals
                      20


                      15
                  Matches



                      10


                        5


                        0
                                 0            1            2              3   4   5   6   7   8
                                                                   Goals Scored

                   Now, that we know about some distributions, it might look
                   vaguely familiar
                                                                                                  15
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                  A Poisson Model of Goals in Football

        We can think about football like this
        Let each minute that we watch a game be an experiment
        The experiment is: is there a goal or not?
        It is a success if there is a goal; it is a failure if there is
        not.
        Since only 3 goals are expected after 90 mins, the
        probability of ‘success’ is very small.
        In each minute we conduct a Bernoulli trial. There are 90
        trials.
        It seems reasonable to model goal scoring in Football as a
        Poisson Process

                                                                                                  16
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
A Poisson Model of Goals in Football

        Alternatively, we can think about football like this
        A game of football is played over 90 minutes. This is a
            constrained time interval
        The number of goals scored in each game is a discrete
            random variable.
        Suppose we divide the match into very small intervals,
            e.g. minutes, then within each small interval, it is
            reasonable to assume that
                1.       There will be at most only one goal scored;
                2.        The probability of observing a goal is proportional to the
                         length of that interval of time, e.g. the probability of
                         observing a goal in 1 minute is twice that of a goal in 30
                         seconds
        The above are the key characteristics of a Poisson process
                                                                                       17
 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                     Poisson Probabilities of Football Goals

          From our data, the expected number of goals per game is
          2.73
          And so, P(zero goals) = e-µ = e-2.73 = 0.0652
          P(1 goal) = 2.73* 0.0652 = 0.1780;
e−µ

          P(2 goals) = 2.73/2* 0.1780 = 0.2430;
          P(3 goals) = 2.73/3* 0.2430 = 0.2212;
          P(4 goals) = 2.73/4* 0.2212 = 0.1509;
          P(5 goals) = 2.73/5* 0.1509 = 0.0824;
          P(6 goals) = 2.73/6* 0.0824 = 0.0375;
          P(7 goals) = 2.73/7* 0.0375 = 0.0146;
          P(8 goals) = 2.73/8* 0.0146 = 0.0050;
                                                                                       18
 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Expected Frequencies of Matches According to Poisson

       According to the Poisson Model, the probability of that a
       football match will end with zero goals is 0.0652. If we
       watch 66 matches in total, how many of them should we
       expect to end with zero goals?


       Number of games with total zero goals = 0.0652*66 = 4.3


       We can thus work out all the expected frequencies of
       matches with i goals by multiplying the Poisson
       probabilities with the total number of matches seen

                                                                                             19
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                       Table 11.1 Comparing Goals Predicted with Observed Goals
                  Goals at                        Poisson Probability     Number of
                  end of                          of Seeing this          Games Expeced
                  Match                           Number of Goals         to end with this
                       0                               0.0652             4.3
                       1                               0.1780             11.8
                       2                               0.2430             16.0
                       3                               0.2212             14.6
                       4                               0.1509             10.0
                       5                               0.0824             5.4
                       6                               0.0375             2.5
                       7                               0.0146             1.0
                       8 or more                       0.0070             0.5
                                                                                             20
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Comparison of Expected and Observed
                       Frequencies of Matches with i goals
    We also have the observed frequencies


    So if scoring goals in football is really a Poisson process,
    then there should not be much difference between the
    expected and observed frequencies


    Any difference between predicted and actual should be
    small and due to random variation only


                                                                                                         21
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                   Football: Observed Vs Poisson Frequencies

                      20


                      15
          Frequency




                      10


                       5


                       0
                              0           1          2          3          4      5   6   7   8   Predicted
                                                                                                  Observed
                                                                          Goals
                                                                                                         22
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Calculate the contribution each fi to the χ2; Find the
                             sum:
χ2
         Goals                               Expected,                     Observed,   Contribution
                                             fe                            fi          s to the χ2
                0 or 1                                16.1                    15        0.0694
                2                                     16.0                    20        0.9774
                3                                     14.6                    11        0.8863
χ2              4                                     10.0                     9        0.0930
                5 or More                              9.2                    11        0.3483
                Total                                 66.0                    66        2.3744
       Degrees of Freedom = k – number of predetermined parameters
       =5–2=3                                                      23
 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




            To test significance answer this question

         • Is the calculated chi-square value so high that it
           is unusual to observe such a value or higher
           values with 3 degrees of freedom?


         • Alternatively: Is the probability of observing a
           chi-square value of 2.37 or more with three
           degrees of freedom small (say 5% or less)?


                                                                                                      24
 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
To find the highest χ2 Value observed 95% of the
                     time, under 3 df, if H0 is true

              Percentage Points of the Chi-Square Distribution
            υ     50 20 15 10                                  5                          3        1
            1    0.45 1.64 2.07 2.71 3.84                                                5.02     6.63
            2    1.39 3.22 3.79 4.61 5.99                                                7.38     9.21
            3    2.37 4.64 5.32 6.25 7.81                                                9.35     11.34
            4    3.36 5.99 6.74 7.78 9.49                                                11.14    13.28
            5    4.35 7.29 8.12 9.24 11.07                                               12.83    15.09

                                                                                                            25
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




           To find the highest χ2 Value observed 95% of the
                     time, under 3 df, if H0 is true

                       Percentage Points of the Chi-Square Distribution
            υ              50        20          15        10           5                     3        1
            1               0.45              1.64              2.07      2.71   3.84     5.02      6.63
            2               1.39              3.22              3.79      4.61   5.99     7.38      9.21
            3               2.37              4.64              5.32      6.25   7.81     9.35      11.34
            4               3.36              5.99              6.74      7.78   9.49     11.14     13.28
            5               4.35              7.29              8.12      9.24   11.07    12.83     15.09


                                                                                                            26
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Incidence of Disease Among Adults

                                  A county council is worried about the number of
Example
                                  adults who suffer from a particular disease and
                                      has collected the following information
                                   AGE GROUP                              SICK   HEALTHY   TOTAL

                                         34-39                            1327    15702    17029

                                         40-44                            2072    17454    19524

                                         45-49                            2456    14237    16693

Contingency                              50-54                            3611    11519    15130
Table
                                         55-59                            4688    9174     13862
Analysis
                                         60-64                            5490    7526     13016

                                                                                                   27
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                               Incidence of Disease Among Adults

Example                            Can it be said that all age groups are equally
                                   likely to be affected and that the differences
                                   may be due to random variation? Or, are some
                                     age groups more susceptible than others to
                                                acquiring the disease?




Contingency
Table
Analysis


                                                                                                   28
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Incidence of Disease Among Adults
                                 What will be the numbers in each of these cells be
Example                          in a perfect world, i.e. in world where advancing
                                          age did not mean more disease?
                                   AGE GROUP                              SICK   HEALTHY   TOTAL

                                         34-39                                             17029

                                         40-44                                             19524

                                         45-49                                             16693

Contingency                              50-54                                             15130
Table
                                         55-59                                             13862
Analysis
                                         60-64                                             13016

                                                                                                   29
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                                           Step1: Hypothesize
Example                            Assume that age is NOT related to the
                                      incidence of the disease, i.e the
                                   maintained hypothesis, H0 is that the
                                  incidence of the disease is independent
                                                   of age.
                                    And the alternative hypothesis, Ha is
                                   that age IS related to the incidence of
                                                 the disease
Contingency
Table
Analysis


                                                                                                   30
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Step2: Create the statistical model such
                                 that age is independent of the incidence
Example                                        of the disease.


                              From the rules of probability; the model is as
                              follows:

                              Let the event that an adult is aged 34-39 be A.
                              Let the event that an adult is sick be S.

            Then, if incidence of the disease is independent
Contingency
            of the age, the probability that an adult is aged
Table
            between 34-39 AND sick is given by the simplified
                      34-
Analysis
            multiplication rule: P(A and S) = P(A)*P(S)

                                                                                                    31
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                 Step3: Apply the simplified multiplication
                                 rule to calculate the probability of every
Example                           combination of age range and sick; and
                                           age range and healthy

                                    AGE GROUP                             SICK    HEALTHY   TOTAL

                                           34-39                          0.037    0.142    0.179

                                           40-44                          0.042    0.163    0.205

                                           45-49                          0.036    0.139    0.175

Contingency                                50-54                          0.033    0.126    0.159

Table                                      55-59                          0.030    0.116    0.146
Analysis
                                           60-64                          0.028    0.108    0.137

                                           Total                          0.206    0.794    1.000
                                                                                                    32
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Step4: Given these probabilities, calculate
                                the expected number of adults of each age
Example                            group expected to be sick, and to be
                                                 healthy
                                   AGE GROUP                              SICK    HEALTHY   TOTAL

                                          34-39                           3512     13518    17029

                                          40-44                           4026     15498    19524

                                          45-49                           3443     13251    16693

                                          50-54                           3120     12010    15130
Contingency
Table                                     55-59                           2859     11004    13862

Analysis                                  60-64                           2684     10332    13016

                                          Total                           19644    75612    95254
                                                                                                    33
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                               Step5: Now test the hypothesis: How well does
                               our independence model predict numbers of
Example                        adults of a certain age who will be sick and who
                               will be healthy?

                                    Use Chi-Square to compare differences
                                       between observed and expected
                                                frequencies.


                                Proceed by calculating the contribution of
Contingency                     each combination of age and sick and age
Table                            and healthy to the chi-square value and
Analysis                                   summing them up.

                                                                                                    34
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Step 5 Cont’d: The Chi-Square value is the sum of
                                        all the contributions. It is 8531. hmm!
Example
                                 What is the probability of observing a χ2 value this
                                  large or larger when the independence model
                                                       holds?
                                    AGE GROUP                             SICK   HEALTHY   TOTAL
                                          34-39                           1359     353     1712
                                          40-44                           949      247     1196
                                          45-49                           283      73       356

Contingency                               50-54                            77      20       97

Table                                     55-59                           1171     304     1475
Analysis                                  60-64                           2933     762     3695
                                           Total                          6771    1760     8531
                                                                                                   35
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                Step6: Calculate Number of degrees of freedom
                                of Chi-Square value
Example
                                    When expected values are calculated, the
                                  expected values in the last column and last row
                                          can be filled in automatically.

                                  This is because the total number of adults, e.g.
                                  the total number of adults aged 34-39, for each
                                    column and row is fixed and known already.

Contingency                      Hence, the values is the last columns and rows
Table                            are not free and the total number of degrees of
Analysis                                            freedom is
                                (number of rows minus one)*(number of columns
                                           minus one) = (6-1) * (2-1) = 5
                                                                                                   36
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
For five d.f. the tables we have created do not list
                            values of the χ2 as high as 8531.

               All we can say is that the probability of values of the
                   χ2 of 8531 or higher must be very very small.

              Alternatively, we can look at the the maximum value
              of the χ2 that is observed 95% of the time for five d.f.
                This is 11.071. Since, 8531 is way beyond this, we
                      must reject the maintained hypothesis.


             Incidence of the disease is not independent of the age.


                                                                                                              37
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                dfare
                   a 25                                 10                   5       2.5       1       0.5
                  1 1.3233                           2.7055               3.8415   5.0239   6.6349   7.8794
                  2 2.7726                           4.6052               5.9915   7.3778   9.2103   10.597
                  3 4.1083                           6.2514               7.8147   9.3484   11.345   12.838
                  4 5.3853                           7.7794               9.4877   11.143   13.277   14.86
                  5 6.6257                           9.2364               11.071   12.833   15.086   16.75
                                                                                                              38
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Example from SPSS Practical: What are the key factors in
                          the value of an MBA program


                                                           THE VARIABLES

                                Salary: Average Salary of MBA graduates

                                        Fees: Program Fees at the school

                                  Age: Average age of an MBA candidate

                                       GMAT: Average academic aptitude

                               Intake: Number of candidates on program

                     Experience: Average experience (yrs) of candidates

                     Country: Whether country is USA (1) or another (0)

                                                                              39
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




             Example from SPSS Practical: Is ‘salary’ related to ‘country’?
                             Chi-square test in SPSS


                                                             Open SPSS 17
                Import the ‘MBA.xls’ data to SPSS as explained in the
                                   SPSS handout.
               We wish to conduct a chi-square cross-tabulation (i.e.
                  contingency table) test on ‘salary’ by ‘country’
              Null Hypothesis: ‘salary’ and ‘country’ are independent
                 Alternative hypothesis: ‘salary’ and ‘country’ are not
                          independent, i.e. they are related.


                                                                              40
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Example from SPSS Practical: Re-coding ‘salary’
The salary variable is not categorical, i.e. it is quantitative and not strictly suitable for
                                        cross-tabulation
                                         So, first, recode salary into categories:
                                             1.         Go to the ‘transform’ menu
                                                  2.        Choose ‘visual binning’
                                      3.         Select ‘salary’ as the variable to bin
                         4.         You should see a histogram of the ‘salary’ variable
 5.         Type a new name for new variable that will be created after ‘re-coding’ in the
             box labelled ‘binned variable’. I have called my new variable ‘salary_codes’.
6.        Select the tab ‘make cutpoints’. There are several options for cutpoints: a good
           one is to divide the data by ‘equal percentiles’. For example, if you input ‘3’ in
          this box, the salary data will be re-coded with 3 cutpoints so that there will be
          four sections of the data- the first 25% values will be re-coded as ‘1’, values in
             the next 25% group (i.e. 25% to 50%) will be recoded as ‘2’ and so on..
                                                             7.           Click ‘ok’
  8.         Check that a new ordinal variable representing categories of salary has been
                                               formed.
                                                                                                41
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                         Example from SPSS Practical: Cross-tabulation of
                                    ‘salary_code’ by ‘country’

                                      Now to conduct a chi-square test:
                                           1.          Go to the ‘analyse’ menu.
                       2.        Choose ‘descriptive statistics’, ‘crosstabs…’
            3.         Input ‘salary_codes’ into the ‘row’ box and ‘country’
                                      into the ‘column’ box.
          4.        Click the ‘statistics’ tab and check the ‘chi-square’ box
                                                                5.         Click ok
                                    6.         What do the results suggest???


                                                                                                42
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
3- Way Chi-square test

             The results above suggest that the ‘salary’ IS related to
                                   the ‘country’.
            Suppose that we think that this relationship is somewhat
                affected by the GMAT of the students, we can test
                   this by creating a three-way cross-tabulation:
             1.         Re-code the GMAT variable into ‘GMAT_codes’, say
                         two categories of ‘low’ and ‘high’ using the ‘visual
                                 binning’ in the ‘transform’ menu
                   2. Repeat the chi-square test of ‘salary_code’ by
                    ‘country’. However, this time, enter the ‘GMAT_code’
                                  variable in the ‘layer’ box.
                                                       3.        Run the model.
                                                                                  43
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                             3- Way Chi-square test Result




                                                                                  44
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
3- Way Chi-square test Result

               The three-way chi-square test, suggests
                   that once we take into account the
                 GMAT average of the students, there is
                  no relationship between ‘salary’ and
                                ‘country’:
                      We can therefore conclude that the
                       observed relationship between ‘salary’
                      and ‘country’ is in fact indirectly caused
                             by the ‘GMAT’ variation….
                                                                                      45
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                 Example: Was the Class lottery Conducted
                         According to the Rules?
                       In order to sample from the distribution of inter-arrival
                       time at a checkout of a super market, we played a lottery.
                       The results ( shown overleaf) show that the simulated
                       distribution looks very much like the distribution from
                       which we are sampling. But they are not the same. So
                       what are we looking at? Are we looking at two data set
                       generated by the same distribution so that the differences
                       can be attributed to random variation? Or are we, in fact,
                       looking at two datasets not of the same distribution so that
                       the differences are not random such as would be the case if
                       the lottery were not conducted properly?


                                                                                      46
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Inter-arrival Distribution from Lottery
                  Inter-arrival                Inter-arrival              Observed    Expected
                  Time                         Probability                Frequency   Frequency
                  1                            0.39                       76          71
                  2                            0.17                       43          31
                  3                            0.13                       18          24
                  4                            0.09                       13          16
                  5                            0.06                       11          11
                  6                            0.05                       5           9
                  7                            0.03                       6           5
                  8                            0.02                       2           4
                  9                            0.06                       9           11
                  TOTAL                                                   183         183         47
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                                            Analysis
        • H0: The observed data from the lottery was generated
          by the same process as the original inter-arrival time
          distribution
        • Using original probabilities calculate expected
          frequencies for each inter-arrival time, out of the total of
          183
        • Combine the category of the inter-arrival time categories
          of 7 and 8 mins, since the expected frequency of 8 mins
          is small (< 5)


                                                                                                  48
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
Analysis Cont’d
        • Calculate the χ2. This is 9.37
        • The d.f. is k – 1, where k is the number of categories of
          inter-arrival time, which is 8. So d.f. = 7
        • For d.f. = 7, the probability of a value of χ2 = 9.37 or
          higher is between 20% and 25%. This is not small.
        • Decision: we cannot reject H0
        • The lottery was conducted according to rules



                                                                          49
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton




                                             Further Reading
        • Alan Agresti, 1996. ‘Introduction to
          Categorical Data Analysis’. John Wiley and
          Sons, London.




                                                                          50
The Chi-square statistic: © 2009 Max Chipulu, University of Southampton

Mais conteúdo relacionado

Mais procurados

Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statisticsanjaemerry
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testShakehand with Life
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statisticsewhite00
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAttaullah Khan
 
Testing of hypotheses
Testing of hypothesesTesting of hypotheses
Testing of hypothesesRajThakuri
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Harve Abella
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-testChristina K J
 
Parametric vs Nonparametric Tests: When to use which
Parametric vs Nonparametric Tests: When to use whichParametric vs Nonparametric Tests: When to use which
Parametric vs Nonparametric Tests: When to use whichGönenç Dalgıç
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: EstimationParag Shah
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introductionGeetika Gulyani
 

Mais procurados (20)

Comparing means
Comparing meansComparing means
Comparing means
 
Student t-test
Student t-testStudent t-test
Student t-test
 
Chi square test
Chi square testChi square test
Chi square test
 
Chi square
Chi squareChi square
Chi square
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statistics
 
Kruskal wallis test
Kruskal wallis testKruskal wallis test
Kruskal wallis test
 
T test statistics
T test statisticsT test statistics
T test statistics
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-test
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Testing of hypotheses
Testing of hypothesesTesting of hypotheses
Testing of hypotheses
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Parametric vs Nonparametric Tests: When to use which
Parametric vs Nonparametric Tests: When to use whichParametric vs Nonparametric Tests: When to use which
Parametric vs Nonparametric Tests: When to use which
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
Sign test
Sign testSign test
Sign test
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: Estimation
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introduction
 

Destaque

Aron chpt 11 ed (2)
Aron chpt 11 ed (2)Aron chpt 11 ed (2)
Aron chpt 11 ed (2)Sandra Nicks
 
Reporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apaReporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apaKen Plummer
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statisticsAnchal Garg
 
Ang Mapahitas-on nga Mariposa
Ang Mapahitas-on nga MariposaAng Mapahitas-on nga Mariposa
Ang Mapahitas-on nga MariposaHuni-huni
 
Chi-Square Test of Independence
Chi-Square Test of IndependenceChi-Square Test of Independence
Chi-Square Test of IndependenceKen Plummer
 
Reporting Pearson Correlation Test of Independence in APA
Reporting Pearson Correlation Test of Independence in APAReporting Pearson Correlation Test of Independence in APA
Reporting Pearson Correlation Test of Independence in APAKen Plummer
 
MBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENT
MBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENTMBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENT
MBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENTSalim Palayi
 
выхадные в тбилиси
выхадные в тбилисивыхадные в тбилиси
выхадные в тбилисиMaia Odisharia
 
Revista qué pasa ranking UNIVERSIDADES 2010. (1de2)
Revista qué pasa   ranking UNIVERSIDADES 2010. (1de2)Revista qué pasa   ranking UNIVERSIDADES 2010. (1de2)
Revista qué pasa ranking UNIVERSIDADES 2010. (1de2)Roberto Manriquez
 
Northeast Ohio Business Community Impact
Northeast Ohio Business Community ImpactNortheast Ohio Business Community Impact
Northeast Ohio Business Community ImpactBVU
 
Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...
Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...
Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...www.DATTANADKARNI.COM
 

Destaque (20)

Chi square analysis
Chi square analysisChi square analysis
Chi square analysis
 
Chi square test
Chi square testChi square test
Chi square test
 
The chi square_test
The chi square_testThe chi square_test
The chi square_test
 
K wtest
K wtestK wtest
K wtest
 
Chi square
Chi squareChi square
Chi square
 
Aron chpt 11 ed (2)
Aron chpt 11 ed (2)Aron chpt 11 ed (2)
Aron chpt 11 ed (2)
 
Reporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apaReporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apa
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statistics
 
Ang Mapahitas-on nga Mariposa
Ang Mapahitas-on nga MariposaAng Mapahitas-on nga Mariposa
Ang Mapahitas-on nga Mariposa
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
Chi-Square Test of Independence
Chi-Square Test of IndependenceChi-Square Test of Independence
Chi-Square Test of Independence
 
Reporting Pearson Correlation Test of Independence in APA
Reporting Pearson Correlation Test of Independence in APAReporting Pearson Correlation Test of Independence in APA
Reporting Pearson Correlation Test of Independence in APA
 
MBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENT
MBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENTMBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENT
MBA HR PROJECT REPORT ON TRAINING AND DEVELOPMENT
 
Green Wave Briefs No. 1
Green Wave Briefs No. 1Green Wave Briefs No. 1
Green Wave Briefs No. 1
 
LIFE - 10/21/09 - LSC-CyFair New Construction Update
LIFE - 10/21/09 - LSC-CyFair New Construction UpdateLIFE - 10/21/09 - LSC-CyFair New Construction Update
LIFE - 10/21/09 - LSC-CyFair New Construction Update
 
World War II
World War IIWorld War II
World War II
 
выхадные в тбилиси
выхадные в тбилисивыхадные в тбилиси
выхадные в тбилиси
 
Revista qué pasa ranking UNIVERSIDADES 2010. (1de2)
Revista qué pasa   ranking UNIVERSIDADES 2010. (1de2)Revista qué pasa   ranking UNIVERSIDADES 2010. (1de2)
Revista qué pasa ranking UNIVERSIDADES 2010. (1de2)
 
Northeast Ohio Business Community Impact
Northeast Ohio Business Community ImpactNortheast Ohio Business Community Impact
Northeast Ohio Business Community Impact
 
Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...
Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...
Datta Nadkarni portfolio 2014- Marketing Strategist for- Farmers, LensCrafter...
 

Semelhante a The Chi Square Test

Socratic Logic, Statistical Hypotheses And Significance Testing
Socratic Logic, Statistical Hypotheses And Significance TestingSocratic Logic, Statistical Hypotheses And Significance Testing
Socratic Logic, Statistical Hypotheses And Significance TestingMax Chipulu
 
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdfCHANSreyya1
 
ch 9 Confidence interval.doc
ch 9 Confidence interval.docch 9 Confidence interval.doc
ch 9 Confidence interval.docAbedurRahman5
 
Chap 3 - PrinciplesofInference-part1.pptx
Chap 3 - PrinciplesofInference-part1.pptxChap 3 - PrinciplesofInference-part1.pptx
Chap 3 - PrinciplesofInference-part1.pptxarifmachinelearning
 
Week 15 PowerPoint
Week 15 PowerPointWeek 15 PowerPoint
Week 15 PowerPointMichael Hill
 
statistical inference.pptx
statistical inference.pptxstatistical inference.pptx
statistical inference.pptxsuerie2
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsMatt Hansen
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimationTech_MX
 
Biostats Lec-2.pdf
Biostats Lec-2.pdfBiostats Lec-2.pdf
Biostats Lec-2.pdfPratikPhate2
 

Semelhante a The Chi Square Test (11)

Socratic Logic, Statistical Hypotheses And Significance Testing
Socratic Logic, Statistical Hypotheses And Significance TestingSocratic Logic, Statistical Hypotheses And Significance Testing
Socratic Logic, Statistical Hypotheses And Significance Testing
 
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf
 
ch 9 Confidence interval.doc
ch 9 Confidence interval.docch 9 Confidence interval.doc
ch 9 Confidence interval.doc
 
Chap 3 - PrinciplesofInference-part1.pptx
Chap 3 - PrinciplesofInference-part1.pptxChap 3 - PrinciplesofInference-part1.pptx
Chap 3 - PrinciplesofInference-part1.pptx
 
Week 15 PowerPoint
Week 15 PowerPointWeek 15 PowerPoint
Week 15 PowerPoint
 
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
 
statistical inference.pptx
statistical inference.pptxstatistical inference.pptx
statistical inference.pptx
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimation
 
Biostats Lec-2.pdf
Biostats Lec-2.pdfBiostats Lec-2.pdf
Biostats Lec-2.pdf
 
Sampling Theory Part 1
Sampling Theory Part 1Sampling Theory Part 1
Sampling Theory Part 1
 

Último

Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 

Último (20)

Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 

The Chi Square Test

  • 1. Test of Significance: The Chi-square Statistic 1 1 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton The Chi-square Statistic Learning Objectives To introduce the Chi-square statistic as a test of statistical significance To apply and interpret the calculated Chi-square statistic for a practical problem, using Chi-square tables and ‘degrees of freedom’. 2 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 2. • “ When it comes to number of babies, all months are equal but some months are more equal than others.” others.” 3 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton The Research Question = Research Hypothesis • It is often thought that there are some ‘boom’ months of their year when the number babies born is higher than others… • Can we, using data of babies who were born to hold a master’s degree, show this to be the case or not? • The research hypothesis is that there is a difference in the number of births from month to month. 4 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 3. The Null Hypothesis • If there is nothing to the myth of boom months, then the distribution of numbers of births would be uniform throughout all the months of the year • Therefore the null hypothesis: there is no difference in the number of births from month to month 5 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Range of Actual Births What 'uniform' births Differences Numbe (observed would look like between rs frequencies) (expected expected and frequencies) observed Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 6 Total The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 4. How well do observed frequencies fit the uniform model? • There are differences between the expected and observed frequencies. But these differences could be just because of the randomness of the data • Intuitively, we know that small differences between the observed and predicted frequencies represent a ‘good’ fit • So, overall, if we sum the differences, then a small sum of differences represents a good model • But positive and negative differences may cancel out • This is not so good 7 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton How well do observed frequencies fit the uniform model? • So we square the differences between frequencies • Then we add squared differences up • A small sum of squares is good • To put the result into context, we divide each square difference by the respective expected frequency • The result is a measure of the goodness of our uniform random model 8 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 5. (fo − fe )2 χ2 = ∑ fe This is the Chi-Square Statistic 9 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Range of Expected Differences between Difference squared Numb Frequencies expected and divided by ers observed expected (contribution to the chi-square) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total The Chi-square statistic: © 2009 Max Chipulu, University of Southampton 10
  • 6. Key Properties of the Chi-Square • The Chi-Square is a non-parametric test: The value of the Chi-square statistics is not affected by the underlying statistical model that generates the data. • The value of the Chi-square depends only on the number of degrees of freedom, the higher the number of degrees of freedom, the higher the value of the chi- square should be. • The number of degrees of freedom is the number of different categories that contribute to the sum of the chi-square sum minus the number of pre-determined (or intermediate) parameters 11 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton ‘Degrees of freedom’ • In this example, degrees of freedom (d.f.) = k - 1, where k is the number of categories (months) that contribute to the chi-square. So d.f. = 12 – 1 = 11 • Suppose that instead of using months we use seasons as our categories. Then we would only have four categories that would contribute to the chi-square. As such, would expect a SMALLER chi- square because there is a smaller number of contributions to the chi- square. • But why subtract by 1? Well the total number of births for all months is a predetermined value: it depends only on the sample size. If we know the frequencies for 11 of the 12 months, and we know the total number of births, then we can work out from these two numbers, what were the number of births in the 12th month. So therefore, although we have in total 12 months (12 categories), there in fact only 11 ways (degrees of freedom) that the chi-square value can vary. 12 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 7. Is the chi-square significant? υ 50 40 30 25 20 15 10 5 3 5 4.351 5.132 6.064 6.626 7.289 8.115 9.236 11.07 12.83 6 5.348 6.211 7.231 7.841 8.558 9.446 10.64 12.59 14.45 7 6.346 7.283 8.383 9.037 9.803 10.75 12.02 14.07 16.01 8 7.344 8.351 9.524 10.22 11.03 12.03 13.36 15.51 17.53 9 8.343 9.414 10.66 11.39 12.24 13.29 14.68 16.92 19.02 10 9.342 10.47 11.78 12.55 13.44 14.53 15.99 18.31 20.48 11 10.34 11.53 12.90 13.70 14.63 15.77 17.28 19.68 21.92 12 11.34 12.58 14.01 14.85 15.81 16.99 18.55 21.03 23.34 13 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Example: Goals in football • Hypothesis: the total number of goals scored in a game of football in Europe follows a Poisson Distribution with mean 2.73 14 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 8. In Europe we observed this distribution of football goals 20 15 Matches 10 5 0 0 1 2 3 4 5 6 7 8 Goals Scored Now, that we know about some distributions, it might look vaguely familiar 15 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton A Poisson Model of Goals in Football We can think about football like this Let each minute that we watch a game be an experiment The experiment is: is there a goal or not? It is a success if there is a goal; it is a failure if there is not. Since only 3 goals are expected after 90 mins, the probability of ‘success’ is very small. In each minute we conduct a Bernoulli trial. There are 90 trials. It seems reasonable to model goal scoring in Football as a Poisson Process 16 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 9. A Poisson Model of Goals in Football Alternatively, we can think about football like this A game of football is played over 90 minutes. This is a constrained time interval The number of goals scored in each game is a discrete random variable. Suppose we divide the match into very small intervals, e.g. minutes, then within each small interval, it is reasonable to assume that 1. There will be at most only one goal scored; 2. The probability of observing a goal is proportional to the length of that interval of time, e.g. the probability of observing a goal in 1 minute is twice that of a goal in 30 seconds The above are the key characteristics of a Poisson process 17 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Poisson Probabilities of Football Goals From our data, the expected number of goals per game is 2.73 And so, P(zero goals) = e-µ = e-2.73 = 0.0652 P(1 goal) = 2.73* 0.0652 = 0.1780; e−µ P(2 goals) = 2.73/2* 0.1780 = 0.2430; P(3 goals) = 2.73/3* 0.2430 = 0.2212; P(4 goals) = 2.73/4* 0.2212 = 0.1509; P(5 goals) = 2.73/5* 0.1509 = 0.0824; P(6 goals) = 2.73/6* 0.0824 = 0.0375; P(7 goals) = 2.73/7* 0.0375 = 0.0146; P(8 goals) = 2.73/8* 0.0146 = 0.0050; 18 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 10. Expected Frequencies of Matches According to Poisson According to the Poisson Model, the probability of that a football match will end with zero goals is 0.0652. If we watch 66 matches in total, how many of them should we expect to end with zero goals? Number of games with total zero goals = 0.0652*66 = 4.3 We can thus work out all the expected frequencies of matches with i goals by multiplying the Poisson probabilities with the total number of matches seen 19 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Table 11.1 Comparing Goals Predicted with Observed Goals Goals at Poisson Probability Number of end of of Seeing this Games Expeced Match Number of Goals to end with this 0 0.0652 4.3 1 0.1780 11.8 2 0.2430 16.0 3 0.2212 14.6 4 0.1509 10.0 5 0.0824 5.4 6 0.0375 2.5 7 0.0146 1.0 8 or more 0.0070 0.5 20 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 11. Comparison of Expected and Observed Frequencies of Matches with i goals We also have the observed frequencies So if scoring goals in football is really a Poisson process, then there should not be much difference between the expected and observed frequencies Any difference between predicted and actual should be small and due to random variation only 21 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Football: Observed Vs Poisson Frequencies 20 15 Frequency 10 5 0 0 1 2 3 4 5 6 7 8 Predicted Observed Goals 22 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 12. Calculate the contribution each fi to the χ2; Find the sum: χ2 Goals Expected, Observed, Contribution fe fi s to the χ2 0 or 1 16.1 15 0.0694 2 16.0 20 0.9774 3 14.6 11 0.8863 χ2 4 10.0 9 0.0930 5 or More 9.2 11 0.3483 Total 66.0 66 2.3744 Degrees of Freedom = k – number of predetermined parameters =5–2=3 23 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton To test significance answer this question • Is the calculated chi-square value so high that it is unusual to observe such a value or higher values with 3 degrees of freedom? • Alternatively: Is the probability of observing a chi-square value of 2.37 or more with three degrees of freedom small (say 5% or less)? 24 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 13. To find the highest χ2 Value observed 95% of the time, under 3 df, if H0 is true Percentage Points of the Chi-Square Distribution υ 50 20 15 10 5 3 1 1 0.45 1.64 2.07 2.71 3.84 5.02 6.63 2 1.39 3.22 3.79 4.61 5.99 7.38 9.21 3 2.37 4.64 5.32 6.25 7.81 9.35 11.34 4 3.36 5.99 6.74 7.78 9.49 11.14 13.28 5 4.35 7.29 8.12 9.24 11.07 12.83 15.09 25 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton To find the highest χ2 Value observed 95% of the time, under 3 df, if H0 is true Percentage Points of the Chi-Square Distribution υ 50 20 15 10 5 3 1 1 0.45 1.64 2.07 2.71 3.84 5.02 6.63 2 1.39 3.22 3.79 4.61 5.99 7.38 9.21 3 2.37 4.64 5.32 6.25 7.81 9.35 11.34 4 3.36 5.99 6.74 7.78 9.49 11.14 13.28 5 4.35 7.29 8.12 9.24 11.07 12.83 15.09 26 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 14. Incidence of Disease Among Adults A county council is worried about the number of Example adults who suffer from a particular disease and has collected the following information AGE GROUP SICK HEALTHY TOTAL 34-39 1327 15702 17029 40-44 2072 17454 19524 45-49 2456 14237 16693 Contingency 50-54 3611 11519 15130 Table 55-59 4688 9174 13862 Analysis 60-64 5490 7526 13016 27 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Incidence of Disease Among Adults Example Can it be said that all age groups are equally likely to be affected and that the differences may be due to random variation? Or, are some age groups more susceptible than others to acquiring the disease? Contingency Table Analysis 28 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 15. Incidence of Disease Among Adults What will be the numbers in each of these cells be Example in a perfect world, i.e. in world where advancing age did not mean more disease? AGE GROUP SICK HEALTHY TOTAL 34-39 17029 40-44 19524 45-49 16693 Contingency 50-54 15130 Table 55-59 13862 Analysis 60-64 13016 29 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Step1: Hypothesize Example Assume that age is NOT related to the incidence of the disease, i.e the maintained hypothesis, H0 is that the incidence of the disease is independent of age. And the alternative hypothesis, Ha is that age IS related to the incidence of the disease Contingency Table Analysis 30 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 16. Step2: Create the statistical model such that age is independent of the incidence Example of the disease. From the rules of probability; the model is as follows: Let the event that an adult is aged 34-39 be A. Let the event that an adult is sick be S. Then, if incidence of the disease is independent Contingency of the age, the probability that an adult is aged Table between 34-39 AND sick is given by the simplified 34- Analysis multiplication rule: P(A and S) = P(A)*P(S) 31 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Step3: Apply the simplified multiplication rule to calculate the probability of every Example combination of age range and sick; and age range and healthy AGE GROUP SICK HEALTHY TOTAL 34-39 0.037 0.142 0.179 40-44 0.042 0.163 0.205 45-49 0.036 0.139 0.175 Contingency 50-54 0.033 0.126 0.159 Table 55-59 0.030 0.116 0.146 Analysis 60-64 0.028 0.108 0.137 Total 0.206 0.794 1.000 32 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 17. Step4: Given these probabilities, calculate the expected number of adults of each age Example group expected to be sick, and to be healthy AGE GROUP SICK HEALTHY TOTAL 34-39 3512 13518 17029 40-44 4026 15498 19524 45-49 3443 13251 16693 50-54 3120 12010 15130 Contingency Table 55-59 2859 11004 13862 Analysis 60-64 2684 10332 13016 Total 19644 75612 95254 33 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Step5: Now test the hypothesis: How well does our independence model predict numbers of Example adults of a certain age who will be sick and who will be healthy? Use Chi-Square to compare differences between observed and expected frequencies. Proceed by calculating the contribution of Contingency each combination of age and sick and age Table and healthy to the chi-square value and Analysis summing them up. 34 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 18. Step 5 Cont’d: The Chi-Square value is the sum of all the contributions. It is 8531. hmm! Example What is the probability of observing a χ2 value this large or larger when the independence model holds? AGE GROUP SICK HEALTHY TOTAL 34-39 1359 353 1712 40-44 949 247 1196 45-49 283 73 356 Contingency 50-54 77 20 97 Table 55-59 1171 304 1475 Analysis 60-64 2933 762 3695 Total 6771 1760 8531 35 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Step6: Calculate Number of degrees of freedom of Chi-Square value Example When expected values are calculated, the expected values in the last column and last row can be filled in automatically. This is because the total number of adults, e.g. the total number of adults aged 34-39, for each column and row is fixed and known already. Contingency Hence, the values is the last columns and rows Table are not free and the total number of degrees of Analysis freedom is (number of rows minus one)*(number of columns minus one) = (6-1) * (2-1) = 5 36 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 19. For five d.f. the tables we have created do not list values of the χ2 as high as 8531. All we can say is that the probability of values of the χ2 of 8531 or higher must be very very small. Alternatively, we can look at the the maximum value of the χ2 that is observed 95% of the time for five d.f. This is 11.071. Since, 8531 is way beyond this, we must reject the maintained hypothesis. Incidence of the disease is not independent of the age. 37 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton dfare a 25 10 5 2.5 1 0.5 1 1.3233 2.7055 3.8415 5.0239 6.6349 7.8794 2 2.7726 4.6052 5.9915 7.3778 9.2103 10.597 3 4.1083 6.2514 7.8147 9.3484 11.345 12.838 4 5.3853 7.7794 9.4877 11.143 13.277 14.86 5 6.6257 9.2364 11.071 12.833 15.086 16.75 38 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 20. Example from SPSS Practical: What are the key factors in the value of an MBA program THE VARIABLES Salary: Average Salary of MBA graduates Fees: Program Fees at the school Age: Average age of an MBA candidate GMAT: Average academic aptitude Intake: Number of candidates on program Experience: Average experience (yrs) of candidates Country: Whether country is USA (1) or another (0) 39 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Example from SPSS Practical: Is ‘salary’ related to ‘country’? Chi-square test in SPSS Open SPSS 17 Import the ‘MBA.xls’ data to SPSS as explained in the SPSS handout. We wish to conduct a chi-square cross-tabulation (i.e. contingency table) test on ‘salary’ by ‘country’ Null Hypothesis: ‘salary’ and ‘country’ are independent Alternative hypothesis: ‘salary’ and ‘country’ are not independent, i.e. they are related. 40 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 21. Example from SPSS Practical: Re-coding ‘salary’ The salary variable is not categorical, i.e. it is quantitative and not strictly suitable for cross-tabulation So, first, recode salary into categories: 1. Go to the ‘transform’ menu 2. Choose ‘visual binning’ 3. Select ‘salary’ as the variable to bin 4. You should see a histogram of the ‘salary’ variable 5. Type a new name for new variable that will be created after ‘re-coding’ in the box labelled ‘binned variable’. I have called my new variable ‘salary_codes’. 6. Select the tab ‘make cutpoints’. There are several options for cutpoints: a good one is to divide the data by ‘equal percentiles’. For example, if you input ‘3’ in this box, the salary data will be re-coded with 3 cutpoints so that there will be four sections of the data- the first 25% values will be re-coded as ‘1’, values in the next 25% group (i.e. 25% to 50%) will be recoded as ‘2’ and so on.. 7. Click ‘ok’ 8. Check that a new ordinal variable representing categories of salary has been formed. 41 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Example from SPSS Practical: Cross-tabulation of ‘salary_code’ by ‘country’ Now to conduct a chi-square test: 1. Go to the ‘analyse’ menu. 2. Choose ‘descriptive statistics’, ‘crosstabs…’ 3. Input ‘salary_codes’ into the ‘row’ box and ‘country’ into the ‘column’ box. 4. Click the ‘statistics’ tab and check the ‘chi-square’ box 5. Click ok 6. What do the results suggest??? 42 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 22. 3- Way Chi-square test The results above suggest that the ‘salary’ IS related to the ‘country’. Suppose that we think that this relationship is somewhat affected by the GMAT of the students, we can test this by creating a three-way cross-tabulation: 1. Re-code the GMAT variable into ‘GMAT_codes’, say two categories of ‘low’ and ‘high’ using the ‘visual binning’ in the ‘transform’ menu 2. Repeat the chi-square test of ‘salary_code’ by ‘country’. However, this time, enter the ‘GMAT_code’ variable in the ‘layer’ box. 3. Run the model. 43 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton 3- Way Chi-square test Result 44 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 23. 3- Way Chi-square test Result The three-way chi-square test, suggests that once we take into account the GMAT average of the students, there is no relationship between ‘salary’ and ‘country’: We can therefore conclude that the observed relationship between ‘salary’ and ‘country’ is in fact indirectly caused by the ‘GMAT’ variation…. 45 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Example: Was the Class lottery Conducted According to the Rules? In order to sample from the distribution of inter-arrival time at a checkout of a super market, we played a lottery. The results ( shown overleaf) show that the simulated distribution looks very much like the distribution from which we are sampling. But they are not the same. So what are we looking at? Are we looking at two data set generated by the same distribution so that the differences can be attributed to random variation? Or are we, in fact, looking at two datasets not of the same distribution so that the differences are not random such as would be the case if the lottery were not conducted properly? 46 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 24. Inter-arrival Distribution from Lottery Inter-arrival Inter-arrival Observed Expected Time Probability Frequency Frequency 1 0.39 76 71 2 0.17 43 31 3 0.13 18 24 4 0.09 13 16 5 0.06 11 11 6 0.05 5 9 7 0.03 6 5 8 0.02 2 4 9 0.06 9 11 TOTAL 183 183 47 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Analysis • H0: The observed data from the lottery was generated by the same process as the original inter-arrival time distribution • Using original probabilities calculate expected frequencies for each inter-arrival time, out of the total of 183 • Combine the category of the inter-arrival time categories of 7 and 8 mins, since the expected frequency of 8 mins is small (< 5) 48 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton
  • 25. Analysis Cont’d • Calculate the χ2. This is 9.37 • The d.f. is k – 1, where k is the number of categories of inter-arrival time, which is 8. So d.f. = 7 • For d.f. = 7, the probability of a value of χ2 = 9.37 or higher is between 20% and 25%. This is not small. • Decision: we cannot reject H0 • The lottery was conducted according to rules 49 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton Further Reading • Alan Agresti, 1996. ‘Introduction to Categorical Data Analysis’. John Wiley and Sons, London. 50 The Chi-square statistic: © 2009 Max Chipulu, University of Southampton