SlideShare uma empresa Scribd logo
1 de 47
Generalized Linear Models for
Between-Subjects Designs
Sean P. Mackinnon
Reminder of Assumptions for
Between-Subjects ANOVA
• Normality*
• Residuals normally distributed, no outliers
• Actually, normality within each group
• Homogeneity of Variance (between-subjects)*
• Variance is assumed to be equal in each group
• In SPSS, one way is to test with Levene’s Test
• Other assumptions too
• Linearity
• Independence of observations
• Homogeneity of regression slopes in ANCOVA)
GLiM in SPSS
• Generalized Linear Models (GLiM) *today’s focus
• Describe the Distribution of the Outcome
• Describe linear regression formula (i.e., the ANOVA)
• Describe Link Function
• Mixed (“multilevel”) Models
• An alternative, more robust way to deal with repeated measures
• Generalized Linear Mixed Models
• Combines the best of both worlds
Some confusion with acronyms
• “Generalized linear models” are easily confused with the “General
Linear Model” which is what underpins ANOVA and regression.
• No standardized acronym. I’ve seen:
• GLiM
• GzLM
• GLM (esp. bad because that’s the one for General Linear Model!)
A Primer on Maximum Likelihood Estimation
• Though you will see the familiar linear regression formula later on,
you should note that GLiM does not use OLS regression and sums of
squares. Instead, GLiM uses Maximum Likelihood Estimation
• Maximum Likelihood Estimation is a computationally intensive
method of estimating statistical parameters by choosing the
parameters that make the data most likely to have happened.
• A great tutorial is here:
• http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html
Assessing the Model: the log-likelihood
statistic
• The Log-likelihood statistic
• Analogous to the residual sum of squares in ANOVA / OLS Regression
• It is an indicator of how much unexplained information there is after
the model has been fitted.
• Large values indicate poorly fitting statistical models.
• The log transformation helps avoid really small numbers that arise
when multiplying many small numbers less than 1
        

N
1i
1ln1lnlikelihoodlog iiii YPYYPY
The conceptual steps of maximum likelihood
• All of the parameters are converted to probabilities with some (complex) math.
• Makes a random guess (random start) for each parameter in the model and
calculates the log likelihood overall.
• Then goes through many “iterations,” and stops computing when subsequent
iterations produce identical log likelihoods. Tolerance is how close is “good
enough” (e.g., .000001).
• Various optimization routines help decide when to stop searching the large
hyper-dimensional space. They are short-cuts and heuristics to avoid an almost
endless search in a sort of “hotter-colder” game.
An advantage: Missing Data
• ANOVA uses Listwise Deletion by default
• If at least data point is missing, omit that person from analysis.
• Listwise deletion has acceptable Type I error rates, but inflates the
Type II error rate because the sample size is lower.
• GLiM (and almost any analysis using maximum likelihood estimation)
incorporates all available data in the algorithm, and handles missing
data in a more sophisticated way that usually reduces bias.
• Will be relatively unbiased if data are “Missing at Random”
GLiM Step 1:
Describe the underlying distribution
• There are many different types of probability distributions. Many of
our statistics rely on the “normal” distribution. However, real
phenomena may follow different distributions.
• Today, we’ll look at 4 types
• Normal “Gaussian” distributions
• Poisson Distribution
• Gamma Distribution
• Negative Binomial Distribution
Normal Distributions
Normal or “Gaussian” distributions are what we are used to working with
ANOVA and Regression assume this kind of outcome
Note that you need two numbers
In order to specify the distribution
1. Mean
2. Variance
The mean is where the center is
The variance is how widely spread
the curve is
Poisson Distributions
Generally speaking, can be useful for count data.
(a) Values can never be negative
(b) Values are integers (i.e., whole numbers);
(c) Distributions tend to be positively skewed.
(d) The mean equals the variance, so represented by one
number rather than two (i.e., lambda, λ)
• As lambda increases, the data become less positively skewed,
and the poisson distribution looks more like the normal dist.
Poisson Distributions
Gamma Distributions
Generally speaking, can be useful for skewed continuous data
(a) Values must be positive and greater than zero
(b) Values do NOT need to be integers (e.g., 1.7 is fine)
(c) Distributions tend to be positively skewed.
(d) The mean equals the variance squared, so again represented by one
number (i.e., gamma, γ)
• As gamma increases, the data become less positively skewed, and the
gamma distribution starts to look more like the normal dist.
Gamma Distributions
Negative Binomial Distribution
Generally speaking, this will be useful for skewed count data where the variance exceeds the
mean (i.e., overdispersed)
(a) Values can never be negative
(b) Values are integers (i.e., whole numbers);
(c) Distributions tend to be positively skewed (usually moreso than poisson)
(d) Is actually a hybrid of the Poisson and Gamma distributions (“Poisson-Gamma mixture”)
(e) Two parameters are estimated, the mu and the “dispersion parameter.” Mu is the mean. The
dispersion parameter is the shape of a gamma distribution.
• As the dispersion parameter approaches infinity, the NB distribution looks exactly like Poisson
Negative Binomial Distribution
Summary
• We can assume an underlying distribution in the population to be
something other than normal.
• The gamma distribution is good for skewed continuous data
• The negative binomial distribution is good for skewed count data
• Generally speaking, the Poisson distribution will be not as useful as
the negative binomial distribution in the vast majority of situations.
GLiM Step 2:
Describe linear regression formula
• Linear regression, Between-subjects ANOVA and ANCOVA can all be
represented by a linear regression formula:
• In ANOVA, categorical variables are dummy-coded first
• In ANCOVA, you add a continuous predictor
• In multiple regression, all are continuous predictors
Example: One Way ANOVA as Regression
outcome𝑖 = model + error𝑖
iiii ondbondbbutcome  2C1CO 210
There will be more than one dummy variable when there are more than 2 groups (# groups -1)
One group will be your “reference” group, usually the control group
All the dummy variables are entered together as predictors of the outcome in one-way ANOVA
Dummy Variable 1
(Condition 1)
Dummy Variable 2
(Condition 2)
Control 0 0
Condition 1 1 0
Condition 2 0 1
GLiM
Step 3: Describe the Link Function
“A mathematically identical interpretation of Y-hati is as a conditional mean (i.e.,
the predicted mean conditional on the predictor variables); the regression equation
yields the conditional mean of scores that are normally distributed with variance
equal to σ2 .”
In other words, the linear regression formula works so long as we assume that the
underlying distribution in the population is normal.
But ... In some cases, we know it’s not. So we need a link function to transform the
poisson, gamma, or NB distributions to normality before calculating!
Neal & Simons (2007)
Link Function Example: Log Link
Before
(Poisson Distribution)
After
(Roughly Normal Distribution)
The Poisson and Negative binomial distributions can be transformed to normality by taking the natural log of all values!
Similarly, the gamma distribution can be restored to normality with a reciprocal (1/x) transformation
(though log works well here too)
Why a regular log transform doesn’t cut it
• Many people are taught to just take the log of the raw data. However,
the problem becomes that you lose interpretability to gain
robustness (i.e., accurate SEs), because there’s no way to get the
original arithmetic means!
• In a log transform, you’re working with the “geometric mean”
• “The Geometric Mean is a special type of average where we multiply the
numbers together and then take a square root (for two numbers), cube root
(for three numbers) etc.”
• https://www.mathsisfun.com/numbers/geometric-mean.html
Most people think of the mean as the arithmetic mean (e.g., sum then divide by number of observations)
So, working with the geometric mean isn’t ideal
Exponentiating (i.e., anti-log) undoes the log transform …
But note that:
e1.0497 = 2.86
e0.9303 = 2.54
These are the geometric means! There’s no easy way to get
from the geometric mean back to the arithmetic mean now!
Here’s an independent t-test on untransformed vs. log transformed data
How GLiM Fixes This
• If we instead work the log transformation into the left side of the
linear regression equation … we will be able to actually do our
hypothesis test AND get the estimates of means in the right metric!
• Really, much of this part of GLiM is just a fancy log transformation!
Example of same data analyzed using GLiM
• Assumes Negative Binomial Distribution w. a log link
The conclusions of this and the log-transformed
analysis are the same
BUT
The means and confidence intervals are now
properly reflecting the arithmetic mean.
(and are more interpretable!)
Dealing with heterogeneous variances:
Huber-White Sandwich Estimator
• Homogeneity of variance (ANOVA) and homoscedasticity (regression) are
important assumptions. One common correction is to use a “robust”
estimate of standard errors that is not much affected by this violation.
• There are at least 5 different algorithms in use. GLiM in SPSS uses the
Huber-White Sandwich Estimator (HC0).
• Unfortunately, this is the most biased of the 5 available algorithms (though
still better than doing nothing). In particular, it has a higher Type I error
rate when samples are small. Hopefully updates will fix this.
Hayes, A. F., & Cai, L. (2007). Using heteroskedasticity-consistent standard error estimators in OLS regression:
An introduction and software implementation. Behavior research methods, 39(4), 709-722.
Did I pick the right distribution / link function?
Screening Residuals
• How can you tell if you selected the right distribution and link
function?
• If the residuals of the analysis are normally distributed, then you’ve
corrected the problem and can proceed to interpretation.
• Residuals can be calculated in numerous ways. Generally, better to
save the “Deviance Residuals” as these are favored at the moment.
Will return to this in the worked-out example
Did I pick the right distribution / link function?
Model Comparison
To compare different models with the same variables (e.g., normal vs. gamma distribution)
Calculate BIC in both models. The model with lowest BIC is the best fitting.
Changes of 6 or more would usually be substantive
(Something called a “log-likelihood ratio test” is also possible if you want a p-value
But for this course, rely on BIC for less arithmetic)
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological
methodology, 25, 111-164.
Will return to
this in the lab
portion
Calculating Goodness of Fit (R2)
• In regression, R2 = SSM / SST
• An analoguous “pseudo- R2” value in GLiM would be:
• McFadden’s Pseudo R2”
Run a model with just the intercept, get the log-likelihood
Run a model with your predictors
Use the equation to the left, and you’ll get pseudo R2
(i.e., 1 – (log likelihood full / log likelihood intercept))
A Worked-Out Example in SPSS
• Does Vehicle Age affect the cost of insurance claims?
• Predictor: Vehicle Age (4 groups)
• 0-3 Years Old
• 4-7 Years Old
• 8-9 Years Old
• 10+ Years Old
• Outcome: Number of insurance claims
• I have reason to believe that a negative binomial distribution would be suitable
(i.e., these are rare count events)
Taken from SPSS example data: car_insurance_claims.sav
Exploring the data: Normality
Looking at the data overall, a
negative binomial distribution seems
appropriate.
Note though, I should also have
reason to believe that this is also
what the distribution looks like in the
population!
(Probably true if using a sufficiently
large random sample)
Exploring the Data: Homogeneity of Variance
Looking at box plots split by group, it
seems pretty clear that there are unequal
variances in the groups.
I could do a formal test, like a Levene’s
test, but these boxplots are enough
justification to use robust SEs (Huber-
White Sandwich Estimator)
Robust SEs will generally be my default
choice in an analysis, regardless.
For type of model, we will select
“negative binomial with a log
link”
If you wanted something not on
this list (e.g., gamma with an
inverse link), click on “custom”.
Put your outcome variable under
“Dependent variable”
Put all your categorical variables under
“Factors”.
Put all your continuous variables (i.e,.
Covariates) under covariates
You can have more than one of each.
If you’re interested in interactions, that
is in the next step
Here is where you build your model.
At the bottom, make sure “include
intercept in model is selected. If that is
the ONLY thing selected, you will be
running the “baseline” or “intercept
only” model which you could use for the
Pseudo R2.
In this example, we just drag our single
predictor over to “model” to get the
effect of vehicle age.
With multiple predictors, you can add
main effects and/or interaction effects,
as you please.
Usually you can leave most of this,
except to click on “robust estimator.”
Clicking this enables the Huber White
Sandwich estimator option, which is for
violations of the homogeneity of
variance issue.
The only time you would usually fuss
with the other options is if your model
“fails to converge.” In other words, at
the last iteration, the log likelihoods
were still changing, so maybe you would
increase the # of iterations.
This could be because the model is very
complex, bad variables in the model, or
lots of other reasons. But in general, if
the model doesn’t converge, do not
interpret output!
There are various elements a person might
want to change in how statistics are
calculated, or what output comes out.
At a beginner level though, the defaults are
usually just fine for what you’ll need.
Here is where you get your means,
post-hoc tests, and planned contrasts.
“Compute means for response” back-
transforms into the original unit of
measurement, and is usually best.
“Pairwise” contrasts are every
possible comparison. Other sorts of
planned contrasts are possible.
You can adjust for the multiple
comparisons. LSD is no adjustment for
familywise error rate.
For post-hoc tests, I usually prefer
sequential Bonferroni method to
control alpha rates out of the
available options.
Saving the deviance residuals will save
the residuals of the analysis as a new
variable.
This will be useful to explore to ensure
that the residuals are (roughly)
normally distributed.
Screening the residuals, it looks like they are normally distributed!
Skewness = 0.01, Kurtosis = -0.50
The negative binomial distribution was likely a good choice
Annotated Output
• These are like your overall F-tests (except
that in GLiM, they are chi-square
distributed, not F-tests)
• The first test is a test of whether your
overall model with predictors is better than
an intercept only model
• The second will break down effects
separately (e.g., if there were two
predictors, you’d have predictor 1, predictor
2, and the intercept).
• There’s a significant effect of vehicle age!
Annotated Output
These are the parameter estimates from the linear regression equation
So, intercept + the slopes for 3 dummy coded variables here
They are more useful when you have continuous variables predicting other continuous variables
In ANOVA designs, you’ll usually find the “estimated marginal means” of more utility
Annotated Output
• These are the means, SEs and
confidence intervals! Report
these in most papers.
• Useful to generate plots with
error bars.
• The means will be the same
as in a regular ANOVA … but
the SEs (and thus confidence
intervals) are different.
Annotated Output
These are all the possible pairwise comparisons (with a sequential Bonferroni correction)
Basically the equivalent of post hoc tests in ANOVA
All but one
comparison (0-3
years vs. 4-7 years)
is statistically
significant here.
Graphing Results
Boxplots are a useful general way to plot non-parametric data
Easily done in SPSS
You can also do bar/line charts w. error bars
Using SPSS output, can graph in Excel
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
180.00
200.00
1 0-3 2 4-7 3 8-9 4 10+
Conclusion
The number of insurance claims differs depending on how old the car is.
Specifically, Ms and SDs show that, as a vehicle ages, fewer insurance
claims are made.
Post-hoc tests using a sequential Bonferroni method show that there is
not much decrease in claims from 0-7 years, but that claims tend to
decrease dramatically as the car gets older than that.

Mais conteúdo relacionado

Mais procurados

Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSSLNIPE
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Time Series Analysis.pptx
Time Series Analysis.pptxTime Series Analysis.pptx
Time Series Analysis.pptxSunny429247
 
1. Introduction to Survival analysis
1. Introduction to Survival analysis1. Introduction to Survival analysis
1. Introduction to Survival analysisGhada Abu sheasha
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1KyusonLim
 
Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingHypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingNaibin
 
SURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.pptSURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.pptmbang ernest
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionVARUN KUMAR
 
multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)Unexplord Solutions LLP
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsMmedsc Hahm
 

Mais procurados (20)

Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSS
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Time Series Analysis.pptx
Time Series Analysis.pptxTime Series Analysis.pptx
Time Series Analysis.pptx
 
Diagnostic in poisson regression models
Diagnostic in poisson regression modelsDiagnostic in poisson regression models
Diagnostic in poisson regression models
 
Count data analysis
Count data analysisCount data analysis
Count data analysis
 
1. Introduction to Survival analysis
1. Introduction to Survival analysis1. Introduction to Survival analysis
1. Introduction to Survival analysis
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
 
Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingHypothesis and Hypothesis Testing
Hypothesis and Hypothesis Testing
 
SURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.pptSURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.ppt
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Survival analysis
Survival  analysisSurvival  analysis
Survival analysis
 
Stat topics
Stat topicsStat topics
Stat topics
 
multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Poisson regression models for count data
Poisson regression models for count dataPoisson regression models for count data
Poisson regression models for count data
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 

Destaque

Basics of Structural Equation Modeling
Basics of Structural Equation ModelingBasics of Structural Equation Modeling
Basics of Structural Equation Modelingsmackinnon
 
Introduction to Mediation using SPSS
Introduction to Mediation using SPSSIntroduction to Mediation using SPSS
Introduction to Mediation using SPSSsmackinnon
 
Increasing Power without Increasing Sample Size
Increasing Power without Increasing Sample SizeIncreasing Power without Increasing Sample Size
Increasing Power without Increasing Sample Sizesmackinnon
 
A gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSSA gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSSsmackinnon
 
NIOSH Lifting Equation Slides
NIOSH Lifting Equation SlidesNIOSH Lifting Equation Slides
NIOSH Lifting Equation SlidesErgonomics Plus
 
NIOSH LIFTING EQN.-ppt
NIOSH LIFTING EQN.-pptNIOSH LIFTING EQN.-ppt
NIOSH LIFTING EQN.-pptNadeem Ahmad
 
Wavelet bootstrap Multiple linear regression models
Wavelet bootstrap Multiple linear regression modelsWavelet bootstrap Multiple linear regression models
Wavelet bootstrap Multiple linear regression modelsVinit Sehgal
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation ModelingBodhiya Wijaya Mulya
 
Structural equation modeling in amos
Structural equation modeling in amosStructural equation modeling in amos
Structural equation modeling in amosBalaji P
 
Safety and Health Management Systems (OSHMS) 2003
Safety and Health Management Systems (OSHMS) 2003Safety and Health Management Systems (OSHMS) 2003
Safety and Health Management Systems (OSHMS) 2003Integer
 
Moderator mediator
Moderator mediatorModerator mediator
Moderator mediatorCarlo Magno
 

Destaque (20)

Basics of Structural Equation Modeling
Basics of Structural Equation ModelingBasics of Structural Equation Modeling
Basics of Structural Equation Modeling
 
Introduction to Mediation using SPSS
Introduction to Mediation using SPSSIntroduction to Mediation using SPSS
Introduction to Mediation using SPSS
 
Increasing Power without Increasing Sample Size
Increasing Power without Increasing Sample SizeIncreasing Power without Increasing Sample Size
Increasing Power without Increasing Sample Size
 
A gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSSA gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSS
 
Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3
 
Structural Equation Modelling (SEM) Part 1
Structural Equation Modelling (SEM) Part 1Structural Equation Modelling (SEM) Part 1
Structural Equation Modelling (SEM) Part 1
 
Structural Equation Modelling (SEM) Part 2
Structural Equation Modelling (SEM) Part 2Structural Equation Modelling (SEM) Part 2
Structural Equation Modelling (SEM) Part 2
 
NIOSH Lifting Equation Slides
NIOSH Lifting Equation SlidesNIOSH Lifting Equation Slides
NIOSH Lifting Equation Slides
 
NIOSH LIFTING EQN.-ppt
NIOSH LIFTING EQN.-pptNIOSH LIFTING EQN.-ppt
NIOSH LIFTING EQN.-ppt
 
Western blotting
Western blottingWestern blotting
Western blotting
 
Wavelet bootstrap Multiple linear regression models
Wavelet bootstrap Multiple linear regression modelsWavelet bootstrap Multiple linear regression models
Wavelet bootstrap Multiple linear regression models
 
Sem+Essentials
Sem+EssentialsSem+Essentials
Sem+Essentials
 
Key ideas, terms and concepts in SEM
Key ideas, terms and concepts in SEMKey ideas, terms and concepts in SEM
Key ideas, terms and concepts in SEM
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling
 
Hospitality Analytics: Learn More About Your Customers
Hospitality Analytics: Learn More About Your CustomersHospitality Analytics: Learn More About Your Customers
Hospitality Analytics: Learn More About Your Customers
 
Structural equation modeling in amos
Structural equation modeling in amosStructural equation modeling in amos
Structural equation modeling in amos
 
Safety and Health Management Systems (OSHMS) 2003
Safety and Health Management Systems (OSHMS) 2003Safety and Health Management Systems (OSHMS) 2003
Safety and Health Management Systems (OSHMS) 2003
 
Sds-Page
Sds-Page Sds-Page
Sds-Page
 
Spss comd interpret
Spss comd interpretSpss comd interpret
Spss comd interpret
 
Moderator mediator
Moderator mediatorModerator mediator
Moderator mediator
 

Semelhante a Generalized Linear Models for Between-Subjects Designs

probability.pptx
probability.pptxprobability.pptx
probability.pptxbisan3
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptTripthiDubey
 
Probability introduction for non-math people
Probability introduction for non-math peopleProbability introduction for non-math people
Probability introduction for non-math peopleGuangYang92
 
PRML Chapter 2
PRML Chapter 2PRML Chapter 2
PRML Chapter 2Sunwoo Kim
 
1.1 course notes inferential statistics
1.1 course notes inferential statistics1.1 course notes inferential statistics
1.1 course notes inferential statisticsDjamel Bob
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 
MEASURES_OF_DISPERSION_-_II.pdf
MEASURES_OF_DISPERSION_-_II.pdfMEASURES_OF_DISPERSION_-_II.pdf
MEASURES_OF_DISPERSION_-_II.pdfAllanGazy
 
MEASURES OF DISPERSION.ppt
MEASURES OF DISPERSION.pptMEASURES OF DISPERSION.ppt
MEASURES OF DISPERSION.pptVnDr
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionSanoj Fernando
 
Module Five Normal Distributions & Hypothesis TestingTop of F.docx
Module Five Normal Distributions & Hypothesis TestingTop of F.docxModule Five Normal Distributions & Hypothesis TestingTop of F.docx
Module Five Normal Distributions & Hypothesis TestingTop of F.docxroushhsiu
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsKapil Dev Ghante
 

Semelhante a Generalized Linear Models for Between-Subjects Designs (20)

15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
probability.pptx
probability.pptxprobability.pptx
probability.pptx
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Probability introduction for non-math people
Probability introduction for non-math peopleProbability introduction for non-math people
Probability introduction for non-math people
 
PRML Chapter 2
PRML Chapter 2PRML Chapter 2
PRML Chapter 2
 
1.1 course notes inferential statistics
1.1 course notes inferential statistics1.1 course notes inferential statistics
1.1 course notes inferential statistics
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
MEASURES_OF_DISPERSION_-_II.pdf
MEASURES_OF_DISPERSION_-_II.pdfMEASURES_OF_DISPERSION_-_II.pdf
MEASURES_OF_DISPERSION_-_II.pdf
 
MEASURES OF DISPERSION.ppt
MEASURES OF DISPERSION.pptMEASURES OF DISPERSION.ppt
MEASURES OF DISPERSION.ppt
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Module Five Normal Distributions & Hypothesis TestingTop of F.docx
Module Five Normal Distributions & Hypothesis TestingTop of F.docxModule Five Normal Distributions & Hypothesis TestingTop of F.docx
Module Five Normal Distributions & Hypothesis TestingTop of F.docx
 
template.pptx
template.pptxtemplate.pptx
template.pptx
 
Sampling Distributions and Estimators
Sampling Distributions and EstimatorsSampling Distributions and Estimators
Sampling Distributions and Estimators
 
26 assumptions
26 assumptions26 assumptions
26 assumptions
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
R meetup lm
R meetup lmR meetup lm
R meetup lm
 
Sampling Distributions and Estimators
Sampling Distributions and Estimators Sampling Distributions and Estimators
Sampling Distributions and Estimators
 

Último

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 

Último (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Generalized Linear Models for Between-Subjects Designs

  • 1. Generalized Linear Models for Between-Subjects Designs Sean P. Mackinnon
  • 2. Reminder of Assumptions for Between-Subjects ANOVA • Normality* • Residuals normally distributed, no outliers • Actually, normality within each group • Homogeneity of Variance (between-subjects)* • Variance is assumed to be equal in each group • In SPSS, one way is to test with Levene’s Test • Other assumptions too • Linearity • Independence of observations • Homogeneity of regression slopes in ANCOVA)
  • 3. GLiM in SPSS • Generalized Linear Models (GLiM) *today’s focus • Describe the Distribution of the Outcome • Describe linear regression formula (i.e., the ANOVA) • Describe Link Function • Mixed (“multilevel”) Models • An alternative, more robust way to deal with repeated measures • Generalized Linear Mixed Models • Combines the best of both worlds
  • 4. Some confusion with acronyms • “Generalized linear models” are easily confused with the “General Linear Model” which is what underpins ANOVA and regression. • No standardized acronym. I’ve seen: • GLiM • GzLM • GLM (esp. bad because that’s the one for General Linear Model!)
  • 5. A Primer on Maximum Likelihood Estimation • Though you will see the familiar linear regression formula later on, you should note that GLiM does not use OLS regression and sums of squares. Instead, GLiM uses Maximum Likelihood Estimation • Maximum Likelihood Estimation is a computationally intensive method of estimating statistical parameters by choosing the parameters that make the data most likely to have happened. • A great tutorial is here: • http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html
  • 6. Assessing the Model: the log-likelihood statistic • The Log-likelihood statistic • Analogous to the residual sum of squares in ANOVA / OLS Regression • It is an indicator of how much unexplained information there is after the model has been fitted. • Large values indicate poorly fitting statistical models. • The log transformation helps avoid really small numbers that arise when multiplying many small numbers less than 1           N 1i 1ln1lnlikelihoodlog iiii YPYYPY
  • 7. The conceptual steps of maximum likelihood • All of the parameters are converted to probabilities with some (complex) math. • Makes a random guess (random start) for each parameter in the model and calculates the log likelihood overall. • Then goes through many “iterations,” and stops computing when subsequent iterations produce identical log likelihoods. Tolerance is how close is “good enough” (e.g., .000001). • Various optimization routines help decide when to stop searching the large hyper-dimensional space. They are short-cuts and heuristics to avoid an almost endless search in a sort of “hotter-colder” game.
  • 8. An advantage: Missing Data • ANOVA uses Listwise Deletion by default • If at least data point is missing, omit that person from analysis. • Listwise deletion has acceptable Type I error rates, but inflates the Type II error rate because the sample size is lower. • GLiM (and almost any analysis using maximum likelihood estimation) incorporates all available data in the algorithm, and handles missing data in a more sophisticated way that usually reduces bias. • Will be relatively unbiased if data are “Missing at Random”
  • 9. GLiM Step 1: Describe the underlying distribution • There are many different types of probability distributions. Many of our statistics rely on the “normal” distribution. However, real phenomena may follow different distributions. • Today, we’ll look at 4 types • Normal “Gaussian” distributions • Poisson Distribution • Gamma Distribution • Negative Binomial Distribution
  • 10. Normal Distributions Normal or “Gaussian” distributions are what we are used to working with ANOVA and Regression assume this kind of outcome Note that you need two numbers In order to specify the distribution 1. Mean 2. Variance The mean is where the center is The variance is how widely spread the curve is
  • 11. Poisson Distributions Generally speaking, can be useful for count data. (a) Values can never be negative (b) Values are integers (i.e., whole numbers); (c) Distributions tend to be positively skewed. (d) The mean equals the variance, so represented by one number rather than two (i.e., lambda, λ) • As lambda increases, the data become less positively skewed, and the poisson distribution looks more like the normal dist.
  • 13. Gamma Distributions Generally speaking, can be useful for skewed continuous data (a) Values must be positive and greater than zero (b) Values do NOT need to be integers (e.g., 1.7 is fine) (c) Distributions tend to be positively skewed. (d) The mean equals the variance squared, so again represented by one number (i.e., gamma, γ) • As gamma increases, the data become less positively skewed, and the gamma distribution starts to look more like the normal dist.
  • 15. Negative Binomial Distribution Generally speaking, this will be useful for skewed count data where the variance exceeds the mean (i.e., overdispersed) (a) Values can never be negative (b) Values are integers (i.e., whole numbers); (c) Distributions tend to be positively skewed (usually moreso than poisson) (d) Is actually a hybrid of the Poisson and Gamma distributions (“Poisson-Gamma mixture”) (e) Two parameters are estimated, the mu and the “dispersion parameter.” Mu is the mean. The dispersion parameter is the shape of a gamma distribution. • As the dispersion parameter approaches infinity, the NB distribution looks exactly like Poisson
  • 17. Summary • We can assume an underlying distribution in the population to be something other than normal. • The gamma distribution is good for skewed continuous data • The negative binomial distribution is good for skewed count data • Generally speaking, the Poisson distribution will be not as useful as the negative binomial distribution in the vast majority of situations.
  • 18. GLiM Step 2: Describe linear regression formula • Linear regression, Between-subjects ANOVA and ANCOVA can all be represented by a linear regression formula: • In ANOVA, categorical variables are dummy-coded first • In ANCOVA, you add a continuous predictor • In multiple regression, all are continuous predictors
  • 19. Example: One Way ANOVA as Regression outcome𝑖 = model + error𝑖 iiii ondbondbbutcome  2C1CO 210 There will be more than one dummy variable when there are more than 2 groups (# groups -1) One group will be your “reference” group, usually the control group All the dummy variables are entered together as predictors of the outcome in one-way ANOVA Dummy Variable 1 (Condition 1) Dummy Variable 2 (Condition 2) Control 0 0 Condition 1 1 0 Condition 2 0 1
  • 20. GLiM Step 3: Describe the Link Function “A mathematically identical interpretation of Y-hati is as a conditional mean (i.e., the predicted mean conditional on the predictor variables); the regression equation yields the conditional mean of scores that are normally distributed with variance equal to σ2 .” In other words, the linear regression formula works so long as we assume that the underlying distribution in the population is normal. But ... In some cases, we know it’s not. So we need a link function to transform the poisson, gamma, or NB distributions to normality before calculating! Neal & Simons (2007)
  • 21. Link Function Example: Log Link Before (Poisson Distribution) After (Roughly Normal Distribution) The Poisson and Negative binomial distributions can be transformed to normality by taking the natural log of all values! Similarly, the gamma distribution can be restored to normality with a reciprocal (1/x) transformation (though log works well here too)
  • 22. Why a regular log transform doesn’t cut it • Many people are taught to just take the log of the raw data. However, the problem becomes that you lose interpretability to gain robustness (i.e., accurate SEs), because there’s no way to get the original arithmetic means! • In a log transform, you’re working with the “geometric mean” • “The Geometric Mean is a special type of average where we multiply the numbers together and then take a square root (for two numbers), cube root (for three numbers) etc.” • https://www.mathsisfun.com/numbers/geometric-mean.html Most people think of the mean as the arithmetic mean (e.g., sum then divide by number of observations) So, working with the geometric mean isn’t ideal
  • 23. Exponentiating (i.e., anti-log) undoes the log transform … But note that: e1.0497 = 2.86 e0.9303 = 2.54 These are the geometric means! There’s no easy way to get from the geometric mean back to the arithmetic mean now! Here’s an independent t-test on untransformed vs. log transformed data
  • 24. How GLiM Fixes This • If we instead work the log transformation into the left side of the linear regression equation … we will be able to actually do our hypothesis test AND get the estimates of means in the right metric! • Really, much of this part of GLiM is just a fancy log transformation!
  • 25. Example of same data analyzed using GLiM • Assumes Negative Binomial Distribution w. a log link The conclusions of this and the log-transformed analysis are the same BUT The means and confidence intervals are now properly reflecting the arithmetic mean. (and are more interpretable!)
  • 26. Dealing with heterogeneous variances: Huber-White Sandwich Estimator • Homogeneity of variance (ANOVA) and homoscedasticity (regression) are important assumptions. One common correction is to use a “robust” estimate of standard errors that is not much affected by this violation. • There are at least 5 different algorithms in use. GLiM in SPSS uses the Huber-White Sandwich Estimator (HC0). • Unfortunately, this is the most biased of the 5 available algorithms (though still better than doing nothing). In particular, it has a higher Type I error rate when samples are small. Hopefully updates will fix this. Hayes, A. F., & Cai, L. (2007). Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation. Behavior research methods, 39(4), 709-722.
  • 27. Did I pick the right distribution / link function? Screening Residuals • How can you tell if you selected the right distribution and link function? • If the residuals of the analysis are normally distributed, then you’ve corrected the problem and can proceed to interpretation. • Residuals can be calculated in numerous ways. Generally, better to save the “Deviance Residuals” as these are favored at the moment. Will return to this in the worked-out example
  • 28. Did I pick the right distribution / link function? Model Comparison To compare different models with the same variables (e.g., normal vs. gamma distribution) Calculate BIC in both models. The model with lowest BIC is the best fitting. Changes of 6 or more would usually be substantive (Something called a “log-likelihood ratio test” is also possible if you want a p-value But for this course, rely on BIC for less arithmetic) Raftery, A. E. (1995). Bayesian model selection in social research. Sociological methodology, 25, 111-164. Will return to this in the lab portion
  • 29. Calculating Goodness of Fit (R2) • In regression, R2 = SSM / SST • An analoguous “pseudo- R2” value in GLiM would be: • McFadden’s Pseudo R2” Run a model with just the intercept, get the log-likelihood Run a model with your predictors Use the equation to the left, and you’ll get pseudo R2 (i.e., 1 – (log likelihood full / log likelihood intercept))
  • 30. A Worked-Out Example in SPSS • Does Vehicle Age affect the cost of insurance claims? • Predictor: Vehicle Age (4 groups) • 0-3 Years Old • 4-7 Years Old • 8-9 Years Old • 10+ Years Old • Outcome: Number of insurance claims • I have reason to believe that a negative binomial distribution would be suitable (i.e., these are rare count events) Taken from SPSS example data: car_insurance_claims.sav
  • 31. Exploring the data: Normality Looking at the data overall, a negative binomial distribution seems appropriate. Note though, I should also have reason to believe that this is also what the distribution looks like in the population! (Probably true if using a sufficiently large random sample)
  • 32. Exploring the Data: Homogeneity of Variance Looking at box plots split by group, it seems pretty clear that there are unequal variances in the groups. I could do a formal test, like a Levene’s test, but these boxplots are enough justification to use robust SEs (Huber- White Sandwich Estimator) Robust SEs will generally be my default choice in an analysis, regardless.
  • 33. For type of model, we will select “negative binomial with a log link” If you wanted something not on this list (e.g., gamma with an inverse link), click on “custom”.
  • 34. Put your outcome variable under “Dependent variable”
  • 35. Put all your categorical variables under “Factors”. Put all your continuous variables (i.e,. Covariates) under covariates You can have more than one of each. If you’re interested in interactions, that is in the next step
  • 36. Here is where you build your model. At the bottom, make sure “include intercept in model is selected. If that is the ONLY thing selected, you will be running the “baseline” or “intercept only” model which you could use for the Pseudo R2. In this example, we just drag our single predictor over to “model” to get the effect of vehicle age. With multiple predictors, you can add main effects and/or interaction effects, as you please.
  • 37. Usually you can leave most of this, except to click on “robust estimator.” Clicking this enables the Huber White Sandwich estimator option, which is for violations of the homogeneity of variance issue. The only time you would usually fuss with the other options is if your model “fails to converge.” In other words, at the last iteration, the log likelihoods were still changing, so maybe you would increase the # of iterations. This could be because the model is very complex, bad variables in the model, or lots of other reasons. But in general, if the model doesn’t converge, do not interpret output!
  • 38. There are various elements a person might want to change in how statistics are calculated, or what output comes out. At a beginner level though, the defaults are usually just fine for what you’ll need.
  • 39. Here is where you get your means, post-hoc tests, and planned contrasts. “Compute means for response” back- transforms into the original unit of measurement, and is usually best. “Pairwise” contrasts are every possible comparison. Other sorts of planned contrasts are possible. You can adjust for the multiple comparisons. LSD is no adjustment for familywise error rate. For post-hoc tests, I usually prefer sequential Bonferroni method to control alpha rates out of the available options.
  • 40. Saving the deviance residuals will save the residuals of the analysis as a new variable. This will be useful to explore to ensure that the residuals are (roughly) normally distributed.
  • 41. Screening the residuals, it looks like they are normally distributed! Skewness = 0.01, Kurtosis = -0.50 The negative binomial distribution was likely a good choice
  • 42. Annotated Output • These are like your overall F-tests (except that in GLiM, they are chi-square distributed, not F-tests) • The first test is a test of whether your overall model with predictors is better than an intercept only model • The second will break down effects separately (e.g., if there were two predictors, you’d have predictor 1, predictor 2, and the intercept). • There’s a significant effect of vehicle age!
  • 43. Annotated Output These are the parameter estimates from the linear regression equation So, intercept + the slopes for 3 dummy coded variables here They are more useful when you have continuous variables predicting other continuous variables In ANOVA designs, you’ll usually find the “estimated marginal means” of more utility
  • 44. Annotated Output • These are the means, SEs and confidence intervals! Report these in most papers. • Useful to generate plots with error bars. • The means will be the same as in a regular ANOVA … but the SEs (and thus confidence intervals) are different.
  • 45. Annotated Output These are all the possible pairwise comparisons (with a sequential Bonferroni correction) Basically the equivalent of post hoc tests in ANOVA All but one comparison (0-3 years vs. 4-7 years) is statistically significant here.
  • 46. Graphing Results Boxplots are a useful general way to plot non-parametric data Easily done in SPSS You can also do bar/line charts w. error bars Using SPSS output, can graph in Excel 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00 1 0-3 2 4-7 3 8-9 4 10+
  • 47. Conclusion The number of insurance claims differs depending on how old the car is. Specifically, Ms and SDs show that, as a vehicle ages, fewer insurance claims are made. Post-hoc tests using a sequential Bonferroni method show that there is not much decrease in claims from 0-7 years, but that claims tend to decrease dramatically as the car gets older than that.

Notas do Editor

  1. The negative binomial model is a hybrid of the Poisson and Gamma distributions. The Poisson parameter is itself a random variable, distributed according to a Gamma distribution. Thus the negative binomial distribution is sometimes known as a Poisson-Gamma mixture.