SlideShare uma empresa Scribd logo
1 de 51
T tests, ANOVAs and
       regression

        Tom Jenkins
       Ellen Meierotto
SPM Methods for Dummies 2007
Why do we need t tests?
Objectives
   Types of error
   Probability distribution
   Z scores
   T tests
   ANOVAs
Error
   Null hypothesis
   Type 1 error (α): false positive
   Type 2 error (β): false negative
Normal distribution
Z scores
   Standardised normal distribution
   µ = 0, σ = 1
   Z scores: 0, 1, 1.65, 1.96
   Need to know population standard
    deviation


                              Z=(x-μ)/σ    for
                              one point
                              compared to pop.
T tests
   Comparing means
   1 sample t
   2 sample t
   Paired t
Different sample variances
2 sample t tests

            x1 − x 2
        t =
             s x1 −x2
                                   2   2
Pooled standard
                                  s1 s 2
                                =   +
error of the mean
                    s x1 − x2
                                  n1 n2
1 sample t test
The effect of degrees of
freedom on t distribution
Paired t tests
T tests in SPM: Did the observed signal
change occur by chance or is it stat.
significant?
   Recall GLM. Y= X β + ε
   β1 is an estimate of signal change over time
    attributable to the condition of interest
   Set up contrast (cT) 1 0 for β1: 1xβ1+0xβ2+0xβn/s.d
   Null hypothesis: cTβ=0 No significant effect at each
    voxel for condition β1
   Contrast 1 -1 : Is the difference between 2 conditions
    significantly non-zero?
   t = cTβ/sd[cTβ] – 1 sided
ANOVA
   Variances not means
   Total variance= model variance + error variance
   Results in F score- corresponding to a p value



Variance

        n

       ∑ ( xi − x ) 2       F test = Model variance /Error
                            variance
s2 =   i =1
              n −1
Partitioning the variance




Group   Group       Group   Group       Group   Group
1       2           1       2           1       2



  Total         =     Model         +      Error
                      (Between groups) (Within groups)
T vs F tests
   F tests- any differences between
    multiple groups, interactions
   Have to determine where differences
    are post-hoc
   SPM- T- one tailed (con)
   SPM- F- two tailed (ess)
Conclusions
   T tests describe how unlikely it is that experimental
    differences are due to chance
   Higher the t score, smaller the p value, more unlikely
    to be due to chance
   Can compare sample with population or 2 samples,
    paired or unpaired
   ANOVA/F tests are similar but use variances instead
    of means and can be applied to more than 2 groups
    and other more complex scenarios
Acknowledgements
   MfD slides 2004-2006
   Van Belle, Biostatistics
   Human Brain Function
   Wikipedia
Correlation and Regression
Topics Covered:
   Is there a relationship between x and y?
   What is the strength of this relationship
       Pearson’s r
   Can we describe this relationship and use it to predict
    y from x?
       Regression
   Is the relationship we have described statistically
    significant?
       F- and t-tests
   Relevance to SPM
       GLM
Relationship between x and y
   Correlation describes the strength and
    direction of a linear relationship between two
    variables
   Regression tells you how well a certain
    independent variable predicts a dependent
    variable

    CORRELATION ≠ CAUSATION
       In order to infer causality: manipulate independent
        variable and observe effect on dependent variable
Scattergrams

       Y                        Y                         Y
           Y                        Y                         Y




                       X                          X                    X




Positive correlation       Negative correlation       No correlation
Variance vs. Covariance
   Do two variables change together?
                                n

Variance ~
                               ∑(x    i   − x)   2


                      S =
                       2
                       x
                               i =1
    DX * DX                           n
                                n

Covariance ~                   ∑(x    i   − x)( yi − y )
    DX * DY
               cov( x, y ) =   i =1
                                            n
Covariance

                         n

                        ∑(x    i   − x)( yi − y )
        cov( x, y ) =   i =1
                                     n
   When X and Y : cov (x,y) = pos.
   When X and Y : cov (x,y) = neg.
   When no constant relationship: cov (x,y)
    =0
Example Covariance

7

6                                                   x       y     xi − x   yi − y   ( xi − x )( yi − y )
5
                                                    0       3     -3       0                   0
4
                                                    2       2     -1       -1                  1
3

2
                                                    3       4     0        1                   0
1
                                                    4       0     1        -3                  -3
0
                                                    6       6     3        3                   9
    0    1   2   3      4        5       6   7
                                                    x=3     y=3                               ∑= 7

                  n

                 ∑(x        i   − x)( yi − y ))
                                                   7                       What does this
cov( x, y ) =    i =1
                                                  = = 1.4                  number tell us?
                                     n             5
Example of how covariance value
       relies on variance
                       High variance data                        Low variance data
                                                   
                                                   
Subject          x           y      x error * y       x               y            X error * y
                                    error                                          error
                                                   
1                101         100    2500              54              53           9
                                                   
2                81          80     900               53              52           4
3                61          60     100               52              51           1
4                51          50     0                 51              50           0
5                41          40     100               50              49           1
6                21          20     900               49              48           4
7                1           0      2500              48              47           9
Mean             51          50                       51              50


Sum of x error * y error :          7000              Sum of x error * y error :   28

Covariance:                         1166.67           Covariance:                  4.67
Pearson’s R
          − ∞ ≤ cov( x, y ) ≤ ∞
   Covariance does not really tell us
    anything
       Solution: standardise this measure
   Pearson’s R: standardise by adding std
    to equation:       cov( x, y )
                      rxy =
                               sx s y
Basic assumptions
   Normal distributions
   Variances are constant and not zero
   Independent sampling – no autocorrelations
   No errors in the values of the independent
    variable
   All causation in the model is one-way (not
    necessary mathematically, but essential for
    prediction)
Pearson’s R: degree of linear
  dependence


                 n                                   n

                ∑ (x   i   − x)( yi − y )           ∑ ( x − x)( y − y)
                                                           i                i
cov( x, y ) =   i =1
                                            rxy =   i =1

                             n                                  nsx s y



  −1 ≤ r ≤ 1
                                                     n

                                                    ∑Z     xi      * Z yi
                                            rxy =   i =1
                                                               n
Limitations of r
                 ˆ
    r is actually r
      r = true r of whole population

     
         ˆ
        r = estimate of r based on data

    r is very sensitive to extreme values:

               5


               4


               3


               2


               1


               0
                   0   1   2   3   4   5     6
In the real world…
   r is never 1 or –1
   interpretations for correlations in
    psychological research (Cohen)

Correlation         Negative         Positive
Small               -0.29 to -0.10   00.10 to 0.29
Medium              -0.49 to -0.30   0.30 to 0.49
Large               -1.00 to -0.50   0.50 to 1.00
Regression
   Correlation tells you if there is an
    association between x and y but it
    doesn’t describe the relationship or
    allow you to predict one variable from
    the other.

   To do this we need REGRESSION!
Best-fit Line
   Aim of linear regression is to fit a straight line, ŷ = ax + b, to data
    that gives best prediction of y for any value of x

   This will be the line that
                                                                          ŷ = ax + b
    minimises distance between                                                slope        intercept
    data and fitted line, i.e.
    the residuals                                            ε




                                                                      = ŷ, predicted value
                                                                      = y i , true value
                                                                   ε = residual error
Least Squares Regression
     To find the best line we must minimise the
      sum of the squares of the residuals (the
      vertical distances from the data points to
      our line)
    Model line: ŷ = ax + b    a = slope, b = intercept

    Residual (ε) = y - ŷ
    Sum of squares of residuals = Σ (y – ŷ)2

   we must find values of a and b that minimise
                    Σ (y – ŷ)2
Finding b
   First we find the value of b that gives the min
    sum of squares


                                              b
             ε             b                          ε
       b




   Trying different values of b is equivalent to
    shifting the line up and down the scatter plot
Finding a
          Now we find the value of a that gives the min
           sum of squares


       b                    b                     b




   Trying out different values of a is equivalent to
    changing the slope of the line, while b stays
    constant
Minimising sums of squares
   Need to minimise Σ(y–ŷ)2
    ŷ = ax + b
   so need to minimise:




                                   sums of squares (S)
            Σ(y - ax - b)2

   If we plot the sums of
    squares for all different
    values of a and b we get a
    parabola, because it is a
    squared term                                                     Gradient = 0
                                                                     min S

                                                         Values of a and b
   So the min sum of squares is
    at the bottom of the curve,
    where the gradient is zero.
The maths bit
   So we can find a and b that give min sum of
    squares by taking partial derivatives of Σ(y -
    ax - b)2 with respect to a and b separately

   Then we solve these for 0 to give us the
    values of a and b that give the min sum of
    squares
The solution
           Doing this gives the following equations for a and b:

                   r sy              r = correlation coefficient of x and y
                a= s                 sy = standard deviation of y
                     x               sx = standard deviation of x

   You can see that:
       A low correlation coefficient gives a flatter slope (small
        value of a)
       Large spread of y, i.e. high standard deviation, results in a
        steeper slope (high value of a)
       Large spread of x, i.e. high standard deviation, results in a
        flatter slope (high value of a)
The solution cont.
           Our model equation is ŷ = ax + b
           This line must pass through the mean so:

            y = ax + b                b = y – ax
   We can put our equation into this giving:
                      r sy             r = correlation coefficient of x and y
        b=y-               x           sy = standard deviation of y
                       sx              sx = standard deviation of x
   The smaller the correlation, the closer the intercept is to the mean
    of y
Back to the model

   We can calculate the regression line for any
    data, but the important question is:
    How well does this line fit the data, or how
    good is it at predicting y from x?
How good is our model?
                                                ∑(y – y)2       SSy
         Total variance of y: sy2 =                        =
                                                  n-1           dfy

   Variance of predicted y values (ŷ):
                   ∑(ŷ – y)2       SSpred            This is the variance
          sŷ2 =                =                     explained by our
                      n-1              dfŷ           regression model

   Error variance:                                This is the variance of the error
                                                   between our predicted y values
                      ∑(y – ŷ)2         SSer       and the actual y values, and
           serror =
               2
                                   =               thus is the variance in y that is
                        n-2              dfer      NOT explained by the
                                                   regression model
How good is our model cont.
   Total variance = predicted variance + error variance
                          sy2 = sŷ2 + ser2

   Conveniently, via some complicated rearranging
                            sŷ2 = r2 sy2


                           r2 = sŷ2 / sy2

   so r2 is the proportion of the variance in y that is explained
    by our regression model
How good is our model cont.
   Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:


                       ser2 = sy2 – r2sy2
                            = sy2 (1 – r2)

   From this we can see that the greater the
    correlation the smaller the error variance, so the
    better our prediction
Is the model significant?
          i.e. do we get a significantly better prediction
           of y from our regression equation than by just
           predicting the mean?

          F-statistic:                complicated
                                       rearranging
                             sŷ2                r2 (n - 2)2
              F(df ,df ) =             =......=
                  ŷ   er
                             ser2                  1 – r2
   And it follows that:
                           r (n - 2)                 So all we need to
(because F = t 2) t(n-2) =                           know are r and n !
                            √1 – r2
General Linear Model
   Linear regression is actually a form of
    the General Linear Model where the
    parameters are a, the slope of the line,
    and b, the intercept.
                   y = ax + b +ε
   A General Linear Model is just any
    model that describes the data in terms
    of a straight line
Multiple regression
   Multiple regression is used to determine the effect of a
    number of independent variables, x1, x2, x3 etc., on a
    single dependent variable, y
   The different x variables are combined in a linear way
    and each has its own regression coefficient:

        y = a1x1+ a2x2 +…..+ anxn + b + ε

   The a parameters reflect the independent contribution of
    each independent variable, x, to the value of the
    dependent variable, y.
   i.e. the amount of variance in y that is accounted for by
    each x variable after all the other x variables have been
    accounted for
SPM
   Linear regression is a GLM that models the effect of one
    independent variable, x, on ONE dependent variable, y

   Multiple Regression models the effect of several independent
    variables, x1, x2 etc, on ONE dependent variable, y

   Both are types of General Linear Model

   GLM can also allow you to analyse the effects of several
    independent x variables on several dependent variables, y1, y2, y3
    etc, in a linear combination

   This is what SPM does and will be explained soon…

Mais conteúdo relacionado

Mais procurados

Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
Sachin Shekde
 
1.3 Experimental Design and Observational Studies
1.3 Experimental Design and Observational Studies 1.3 Experimental Design and Observational Studies
1.3 Experimental Design and Observational Studies
MaryWall14
 
F Distribution
F  DistributionF  Distribution
F Distribution
jravish
 

Mais procurados (20)

Chi squared test
Chi squared testChi squared test
Chi squared test
 
Chi square test
Chi square testChi square test
Chi square test
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Introduction to Biostatistics and types of sampling methods
Introduction to Biostatistics and types of sampling methodsIntroduction to Biostatistics and types of sampling methods
Introduction to Biostatistics and types of sampling methods
 
PROCEDURE FOR TESTING HYPOTHESIS
PROCEDURE FOR   TESTING HYPOTHESIS PROCEDURE FOR   TESTING HYPOTHESIS
PROCEDURE FOR TESTING HYPOTHESIS
 
1.3 Experimental Design and Observational Studies
1.3 Experimental Design and Observational Studies 1.3 Experimental Design and Observational Studies
1.3 Experimental Design and Observational Studies
 
Correlation ppt...
Correlation ppt...Correlation ppt...
Correlation ppt...
 
Student t-test
Student t-testStudent t-test
Student t-test
 
The mann whitney u test
The mann whitney u testThe mann whitney u test
The mann whitney u test
 
F Distribution
F  DistributionF  Distribution
F Distribution
 
Regression
RegressionRegression
Regression
 
Parametric test
Parametric testParametric test
Parametric test
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Correlation
CorrelationCorrelation
Correlation
 
Chi – square test
Chi – square testChi – square test
Chi – square test
 
Student t test
Student t testStudent t test
Student t test
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
One way anova final ppt.
One way anova final ppt.One way anova final ppt.
One way anova final ppt.
 

Semelhante a T tests anovas and regression

Accuracy
AccuracyAccuracy
Accuracy
esraz
 

Semelhante a T tests anovas and regression (20)

Yangs First Lecture Ppt
Yangs First Lecture PptYangs First Lecture Ppt
Yangs First Lecture Ppt
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Symmetrical2
Symmetrical2Symmetrical2
Symmetrical2
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .ppt
 
Mth 4101-2 b
Mth 4101-2 bMth 4101-2 b
Mth 4101-2 b
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Accuracy
AccuracyAccuracy
Accuracy
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Statistical methods
Statistical methods Statistical methods
Statistical methods
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 

Mais de University Of Central Punjab

Mais de University Of Central Punjab (20)

Causal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
Causal Relationship between Macroeconomic Factors and Stock Prices in PakistanCausal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
Causal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
 
A letter from DNA pioneer francis crick to his son
A letter from DNA pioneer francis crick to his sonA letter from DNA pioneer francis crick to his son
A letter from DNA pioneer francis crick to his son
 
International accounting standards ias intro
International accounting standards   ias introInternational accounting standards   ias intro
International accounting standards ias intro
 
Iasb framework
Iasb frameworkIasb framework
Iasb framework
 
Ias 7
Ias 7Ias 7
Ias 7
 
Ias 2
Ias 2Ias 2
Ias 2
 
Ias 1
Ias 1Ias 1
Ias 1
 
Cash flow
Cash flowCash flow
Cash flow
 
Annual report 2011 Packages
Annual report 2011 PackagesAnnual report 2011 Packages
Annual report 2011 Packages
 
Electricity & its regulations in America
Electricity & its regulations in AmericaElectricity & its regulations in America
Electricity & its regulations in America
 
Tobacco industry strategy
Tobacco industry strategyTobacco industry strategy
Tobacco industry strategy
 
Tobacco industrial article 2012
Tobacco industrial article 2012Tobacco industrial article 2012
Tobacco industrial article 2012
 
Corporate lobbying
Corporate lobbyingCorporate lobbying
Corporate lobbying
 
Federalism in india
Federalism in indiaFederalism in india
Federalism in india
 
Seven layers of atmosphere
Seven layers of atmosphereSeven layers of atmosphere
Seven layers of atmosphere
 
Scientific explanation for the event of miraj
Scientific explanation for the event of mirajScientific explanation for the event of miraj
Scientific explanation for the event of miraj
 
Reason for makkah being most peacful place
Reason for makkah being most peacful placeReason for makkah being most peacful place
Reason for makkah being most peacful place
 
Mentors are meant to be respected
Mentors are meant to be respectedMentors are meant to be respected
Mentors are meant to be respected
 
Makkah as center mean point of the world
Makkah as center mean point of the worldMakkah as center mean point of the world
Makkah as center mean point of the world
 
The power of quran healing
The power of quran healingThe power of quran healing
The power of quran healing
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 

T tests anovas and regression

  • 1. T tests, ANOVAs and regression Tom Jenkins Ellen Meierotto SPM Methods for Dummies 2007
  • 2. Why do we need t tests?
  • 3. Objectives  Types of error  Probability distribution  Z scores  T tests  ANOVAs
  • 4.
  • 5. Error  Null hypothesis  Type 1 error (α): false positive  Type 2 error (β): false negative
  • 7. Z scores  Standardised normal distribution  µ = 0, σ = 1  Z scores: 0, 1, 1.65, 1.96  Need to know population standard deviation Z=(x-μ)/σ for one point compared to pop.
  • 8. T tests  Comparing means  1 sample t  2 sample t  Paired t
  • 10. 2 sample t tests x1 − x 2 t = s x1 −x2 2 2 Pooled standard s1 s 2 = + error of the mean s x1 − x2 n1 n2
  • 11. 1 sample t test
  • 12. The effect of degrees of freedom on t distribution
  • 14. T tests in SPM: Did the observed signal change occur by chance or is it stat. significant?  Recall GLM. Y= X β + ε  β1 is an estimate of signal change over time attributable to the condition of interest  Set up contrast (cT) 1 0 for β1: 1xβ1+0xβ2+0xβn/s.d  Null hypothesis: cTβ=0 No significant effect at each voxel for condition β1  Contrast 1 -1 : Is the difference between 2 conditions significantly non-zero?  t = cTβ/sd[cTβ] – 1 sided
  • 15.
  • 16. ANOVA  Variances not means  Total variance= model variance + error variance  Results in F score- corresponding to a p value Variance n ∑ ( xi − x ) 2 F test = Model variance /Error variance s2 = i =1 n −1
  • 17. Partitioning the variance Group Group Group Group Group Group 1 2 1 2 1 2 Total = Model + Error (Between groups) (Within groups)
  • 18. T vs F tests  F tests- any differences between multiple groups, interactions  Have to determine where differences are post-hoc  SPM- T- one tailed (con)  SPM- F- two tailed (ess)
  • 19.
  • 20. Conclusions  T tests describe how unlikely it is that experimental differences are due to chance  Higher the t score, smaller the p value, more unlikely to be due to chance  Can compare sample with population or 2 samples, paired or unpaired  ANOVA/F tests are similar but use variances instead of means and can be applied to more than 2 groups and other more complex scenarios
  • 21. Acknowledgements  MfD slides 2004-2006  Van Belle, Biostatistics  Human Brain Function  Wikipedia
  • 23. Topics Covered:  Is there a relationship between x and y?  What is the strength of this relationship  Pearson’s r  Can we describe this relationship and use it to predict y from x?  Regression  Is the relationship we have described statistically significant?  F- and t-tests  Relevance to SPM  GLM
  • 24. Relationship between x and y  Correlation describes the strength and direction of a linear relationship between two variables  Regression tells you how well a certain independent variable predicts a dependent variable  CORRELATION ≠ CAUSATION  In order to infer causality: manipulate independent variable and observe effect on dependent variable
  • 25. Scattergrams Y Y Y Y Y Y X X X Positive correlation Negative correlation No correlation
  • 26. Variance vs. Covariance  Do two variables change together? n Variance ~ ∑(x i − x) 2 S = 2 x i =1 DX * DX n n Covariance ~ ∑(x i − x)( yi − y ) DX * DY cov( x, y ) = i =1 n
  • 27. Covariance n ∑(x i − x)( yi − y ) cov( x, y ) = i =1 n  When X and Y : cov (x,y) = pos.  When X and Y : cov (x,y) = neg.  When no constant relationship: cov (x,y) =0
  • 28. Example Covariance 7 6 x y xi − x yi − y ( xi − x )( yi − y ) 5 0 3 -3 0 0 4 2 2 -1 -1 1 3 2 3 4 0 1 0 1 4 0 1 -3 -3 0 6 6 3 3 9 0 1 2 3 4 5 6 7 x=3 y=3 ∑= 7 n ∑(x i − x)( yi − y )) 7 What does this cov( x, y ) = i =1 = = 1.4 number tell us? n 5
  • 29. Example of how covariance value relies on variance   High variance data    Low variance data         Subject x y x error * y   x y X error * y error   error   1 101 100 2500 54 53 9   2 81 80 900   53 52 4 3 61 60 100 52 51 1 4 51 50 0 51 50 0 5 41 40 100 50 49 1 6 21 20 900 49 48 4 7 1 0 2500 48 47 9 Mean 51 50 51 50 Sum of x error * y error : 7000 Sum of x error * y error : 28 Covariance: 1166.67 Covariance: 4.67
  • 30. Pearson’s R − ∞ ≤ cov( x, y ) ≤ ∞  Covariance does not really tell us anything  Solution: standardise this measure  Pearson’s R: standardise by adding std to equation: cov( x, y ) rxy = sx s y
  • 31. Basic assumptions  Normal distributions  Variances are constant and not zero  Independent sampling – no autocorrelations  No errors in the values of the independent variable  All causation in the model is one-way (not necessary mathematically, but essential for prediction)
  • 32. Pearson’s R: degree of linear dependence n n ∑ (x i − x)( yi − y ) ∑ ( x − x)( y − y) i i cov( x, y ) = i =1 rxy = i =1 n nsx s y −1 ≤ r ≤ 1 n ∑Z xi * Z yi rxy = i =1 n
  • 33. Limitations of r  ˆ r is actually r  r = true r of whole population  ˆ r = estimate of r based on data  r is very sensitive to extreme values: 5 4 3 2 1 0 0 1 2 3 4 5 6
  • 34. In the real world…  r is never 1 or –1  interpretations for correlations in psychological research (Cohen) Correlation Negative Positive Small -0.29 to -0.10 00.10 to 0.29 Medium -0.49 to -0.30 0.30 to 0.49 Large -1.00 to -0.50 0.50 to 1.00
  • 35. Regression  Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other.  To do this we need REGRESSION!
  • 36. Best-fit Line  Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x  This will be the line that ŷ = ax + b minimises distance between slope intercept data and fitted line, i.e. the residuals ε = ŷ, predicted value = y i , true value ε = residual error
  • 37. Least Squares Regression To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line) Model line: ŷ = ax + b a = slope, b = intercept Residual (ε) = y - ŷ Sum of squares of residuals = Σ (y – ŷ)2  we must find values of a and b that minimise Σ (y – ŷ)2
  • 38. Finding b  First we find the value of b that gives the min sum of squares b ε b ε b  Trying different values of b is equivalent to shifting the line up and down the scatter plot
  • 39. Finding a  Now we find the value of a that gives the min sum of squares b b b  Trying out different values of a is equivalent to changing the slope of the line, while b stays constant
  • 40. Minimising sums of squares  Need to minimise Σ(y–ŷ)2  ŷ = ax + b  so need to minimise: sums of squares (S) Σ(y - ax - b)2  If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term Gradient = 0 min S Values of a and b  So the min sum of squares is at the bottom of the curve, where the gradient is zero.
  • 41. The maths bit  So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately  Then we solve these for 0 to give us the values of a and b that give the min sum of squares
  • 42. The solution  Doing this gives the following equations for a and b: r sy r = correlation coefficient of x and y a= s sy = standard deviation of y x sx = standard deviation of x  You can see that:  A low correlation coefficient gives a flatter slope (small value of a)  Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a)  Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a)
  • 43. The solution cont.  Our model equation is ŷ = ax + b  This line must pass through the mean so: y = ax + b b = y – ax  We can put our equation into this giving: r sy r = correlation coefficient of x and y b=y- x sy = standard deviation of y sx sx = standard deviation of x  The smaller the correlation, the closer the intercept is to the mean of y
  • 44. Back to the model  We can calculate the regression line for any data, but the important question is: How well does this line fit the data, or how good is it at predicting y from x?
  • 45. How good is our model? ∑(y – y)2 SSy  Total variance of y: sy2 = = n-1 dfy  Variance of predicted y values (ŷ): ∑(ŷ – y)2 SSpred This is the variance sŷ2 = = explained by our n-1 dfŷ regression model  Error variance: This is the variance of the error between our predicted y values ∑(y – ŷ)2 SSer and the actual y values, and serror = 2 = thus is the variance in y that is n-2 dfer NOT explained by the regression model
  • 46. How good is our model cont.  Total variance = predicted variance + error variance sy2 = sŷ2 + ser2  Conveniently, via some complicated rearranging sŷ2 = r2 sy2 r2 = sŷ2 / sy2  so r2 is the proportion of the variance in y that is explained by our regression model
  • 47. How good is our model cont.  Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get: ser2 = sy2 – r2sy2 = sy2 (1 – r2)  From this we can see that the greater the correlation the smaller the error variance, so the better our prediction
  • 48. Is the model significant?  i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?  F-statistic: complicated rearranging sŷ2 r2 (n - 2)2 F(df ,df ) = =......= ŷ er ser2 1 – r2  And it follows that: r (n - 2) So all we need to (because F = t 2) t(n-2) = know are r and n ! √1 – r2
  • 49. General Linear Model  Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept. y = ax + b +ε  A General Linear Model is just any model that describes the data in terms of a straight line
  • 50. Multiple regression  Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y  The different x variables are combined in a linear way and each has its own regression coefficient: y = a1x1+ a2x2 +…..+ anxn + b + ε  The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.  i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for
  • 51. SPM  Linear regression is a GLM that models the effect of one independent variable, x, on ONE dependent variable, y  Multiple Regression models the effect of several independent variables, x1, x2 etc, on ONE dependent variable, y  Both are types of General Linear Model  GLM can also allow you to analyse the effects of several independent x variables on several dependent variables, y1, y2, y3 etc, in a linear combination  This is what SPM does and will be explained soon…

Notas do Editor

  1. We often want to know whether various variables are ‘linked’, i.e., correlated. This can be interesting in itself, but is also important if we want to predict one variable’s value given a value of the other.
  2. means that correlation cannot be validly used to infer a causal relationship between the variables not be taken to mean that correlations cannot indicate causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health? Or does good health lead to good mood? Or does some other factor underlie both? Or is it pure coincidence? In other words, a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be.
  3. Variance is just a definition. The reason we’re squaring it is to have it get a positive value whether dx is negative or positive, so that we can sum them and positives and negatives will not cancel out. Variance is spread around a mean, covariance is the measure of how much x and y change together; very similar: multiply 2 variables rather than square 1
  4. Problem with Covariance: The value obtained by covariance is dependent on the size of the data’s standard deviations: if large, the value will be greater than if small… even if the relationship between x and y is exactly the same in the large versus small standard deviation datasets.
  5. Can only compare covariances between different variables to see which is greater.
  6. The distance of r from 0 indicates strength of correlation r = 1 or r = (-1) means that we can predict y from x and vice versa with certainty; all data points are on a straight line. i.e., y = ax + b The correlation is 1 in the case of an increasing linear relationship, −1 in the case of a decreasing linear relationship, and some value in between in all other cases, indicating the degree of linear dependence between the variables. The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables. the correlation coefficient detects only linear dependencies between two variables
  7. X av = 3, y av = 2 If extreme value such as y = 5 is possible (graph) X – 1,2,3,4,5, Y – 1,2,3,4,0
  8. the interpretation of a correlation coefficient depends on the context and purposes. A correlation of 0.9 may be very low if one is verifying a physical law using high-quality instruments, but may be regarded as very high in the social sciences where there may be a greater contribution from complicating factors.
  9. So to understand the relationship between two variables, we want to draw the ‘best’ line through the cloud – find the best fit . This is done using the principle of least squares