This document provides an overview of generalized linear models (GLiM) for analyzing between-subjects designs. It discusses key assumptions of between-subjects ANOVA such as normality and homogeneity of variance. It then explains how GLiM in SPSS can be used as an alternative approach that describes the distribution of the outcome variable, specifies a link function, and uses maximum likelihood estimation rather than ordinary least squares. The document walks through an example comparing models with different distributions and link functions, and demonstrates interpreting output including parameter estimates, tests of effects, and estimated marginal means.
2. Reminder of Assumptions for
Between-Subjects ANOVA
• Normality*
• Residuals normally distributed, no outliers
• Actually, normality within each group
• Homogeneity of Variance (between-subjects)*
• Variance is assumed to be equal in each group
• In SPSS, one way is to test with Levene’s Test
• Other assumptions too
• Linearity
• Independence of observations
• Homogeneity of regression slopes in ANCOVA)
3. GLiM in SPSS
• Generalized Linear Models (GLiM) *today’s focus
• Describe the Distribution of the Outcome
• Describe linear regression formula (i.e., the ANOVA)
• Describe Link Function
• Mixed (“multilevel”) Models
• An alternative, more robust way to deal with repeated measures
• Generalized Linear Mixed Models
• Combines the best of both worlds
4. Some confusion with acronyms
• “Generalized linear models” are easily confused with the “General
Linear Model” which is what underpins ANOVA and regression.
• No standardized acronym. I’ve seen:
• GLiM
• GzLM
• GLM (esp. bad because that’s the one for General Linear Model!)
5. A Primer on Maximum Likelihood Estimation
• Though you will see the familiar linear regression formula later on,
you should note that GLiM does not use OLS regression and sums of
squares. Instead, GLiM uses Maximum Likelihood Estimation
• Maximum Likelihood Estimation is a computationally intensive
method of estimating statistical parameters by choosing the
parameters that make the data most likely to have happened.
• A great tutorial is here:
• http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html
6. Assessing the Model: the log-likelihood
statistic
• The Log-likelihood statistic
• Analogous to the residual sum of squares in ANOVA / OLS Regression
• It is an indicator of how much unexplained information there is after
the model has been fitted.
• Large values indicate poorly fitting statistical models.
• The log transformation helps avoid really small numbers that arise
when multiplying many small numbers less than 1
N
1i
1ln1lnlikelihoodlog iiii YPYYPY
7. The conceptual steps of maximum likelihood
• All of the parameters are converted to probabilities with some (complex) math.
• Makes a random guess (random start) for each parameter in the model and
calculates the log likelihood overall.
• Then goes through many “iterations,” and stops computing when subsequent
iterations produce identical log likelihoods. Tolerance is how close is “good
enough” (e.g., .000001).
• Various optimization routines help decide when to stop searching the large
hyper-dimensional space. They are short-cuts and heuristics to avoid an almost
endless search in a sort of “hotter-colder” game.
8. An advantage: Missing Data
• ANOVA uses Listwise Deletion by default
• If at least data point is missing, omit that person from analysis.
• Listwise deletion has acceptable Type I error rates, but inflates the
Type II error rate because the sample size is lower.
• GLiM (and almost any analysis using maximum likelihood estimation)
incorporates all available data in the algorithm, and handles missing
data in a more sophisticated way that usually reduces bias.
• Will be relatively unbiased if data are “Missing at Random”
9. GLiM Step 1:
Describe the underlying distribution
• There are many different types of probability distributions. Many of
our statistics rely on the “normal” distribution. However, real
phenomena may follow different distributions.
• Today, we’ll look at 4 types
• Normal “Gaussian” distributions
• Poisson Distribution
• Gamma Distribution
• Negative Binomial Distribution
10. Normal Distributions
Normal or “Gaussian” distributions are what we are used to working with
ANOVA and Regression assume this kind of outcome
Note that you need two numbers
In order to specify the distribution
1. Mean
2. Variance
The mean is where the center is
The variance is how widely spread
the curve is
11. Poisson Distributions
Generally speaking, can be useful for count data.
(a) Values can never be negative
(b) Values are integers (i.e., whole numbers);
(c) Distributions tend to be positively skewed.
(d) The mean equals the variance, so represented by one
number rather than two (i.e., lambda, λ)
• As lambda increases, the data become less positively skewed,
and the poisson distribution looks more like the normal dist.
13. Gamma Distributions
Generally speaking, can be useful for skewed continuous data
(a) Values must be positive and greater than zero
(b) Values do NOT need to be integers (e.g., 1.7 is fine)
(c) Distributions tend to be positively skewed.
(d) The mean equals the variance squared, so again represented by one
number (i.e., gamma, γ)
• As gamma increases, the data become less positively skewed, and the
gamma distribution starts to look more like the normal dist.
15. Negative Binomial Distribution
Generally speaking, this will be useful for skewed count data where the variance exceeds the
mean (i.e., overdispersed)
(a) Values can never be negative
(b) Values are integers (i.e., whole numbers);
(c) Distributions tend to be positively skewed (usually moreso than poisson)
(d) Is actually a hybrid of the Poisson and Gamma distributions (“Poisson-Gamma mixture”)
(e) Two parameters are estimated, the mu and the “dispersion parameter.” Mu is the mean. The
dispersion parameter is the shape of a gamma distribution.
• As the dispersion parameter approaches infinity, the NB distribution looks exactly like Poisson
17. Summary
• We can assume an underlying distribution in the population to be
something other than normal.
• The gamma distribution is good for skewed continuous data
• The negative binomial distribution is good for skewed count data
• Generally speaking, the Poisson distribution will be not as useful as
the negative binomial distribution in the vast majority of situations.
18. GLiM Step 2:
Describe linear regression formula
• Linear regression, Between-subjects ANOVA and ANCOVA can all be
represented by a linear regression formula:
• In ANOVA, categorical variables are dummy-coded first
• In ANCOVA, you add a continuous predictor
• In multiple regression, all are continuous predictors
19. Example: One Way ANOVA as Regression
outcome𝑖 = model + error𝑖
iiii ondbondbbutcome 2C1CO 210
There will be more than one dummy variable when there are more than 2 groups (# groups -1)
One group will be your “reference” group, usually the control group
All the dummy variables are entered together as predictors of the outcome in one-way ANOVA
Dummy Variable 1
(Condition 1)
Dummy Variable 2
(Condition 2)
Control 0 0
Condition 1 1 0
Condition 2 0 1
20. GLiM
Step 3: Describe the Link Function
“A mathematically identical interpretation of Y-hati is as a conditional mean (i.e.,
the predicted mean conditional on the predictor variables); the regression equation
yields the conditional mean of scores that are normally distributed with variance
equal to σ2 .”
In other words, the linear regression formula works so long as we assume that the
underlying distribution in the population is normal.
But ... In some cases, we know it’s not. So we need a link function to transform the
poisson, gamma, or NB distributions to normality before calculating!
Neal & Simons (2007)
21. Link Function Example: Log Link
Before
(Poisson Distribution)
After
(Roughly Normal Distribution)
The Poisson and Negative binomial distributions can be transformed to normality by taking the natural log of all values!
Similarly, the gamma distribution can be restored to normality with a reciprocal (1/x) transformation
(though log works well here too)
22. Why a regular log transform doesn’t cut it
• Many people are taught to just take the log of the raw data. However,
the problem becomes that you lose interpretability to gain
robustness (i.e., accurate SEs), because there’s no way to get the
original arithmetic means!
• In a log transform, you’re working with the “geometric mean”
• “The Geometric Mean is a special type of average where we multiply the
numbers together and then take a square root (for two numbers), cube root
(for three numbers) etc.”
• https://www.mathsisfun.com/numbers/geometric-mean.html
Most people think of the mean as the arithmetic mean (e.g., sum then divide by number of observations)
So, working with the geometric mean isn’t ideal
23. Exponentiating (i.e., anti-log) undoes the log transform …
But note that:
e1.0497 = 2.86
e0.9303 = 2.54
These are the geometric means! There’s no easy way to get
from the geometric mean back to the arithmetic mean now!
Here’s an independent t-test on untransformed vs. log transformed data
24. How GLiM Fixes This
• If we instead work the log transformation into the left side of the
linear regression equation … we will be able to actually do our
hypothesis test AND get the estimates of means in the right metric!
• Really, much of this part of GLiM is just a fancy log transformation!
25. Example of same data analyzed using GLiM
• Assumes Negative Binomial Distribution w. a log link
The conclusions of this and the log-transformed
analysis are the same
BUT
The means and confidence intervals are now
properly reflecting the arithmetic mean.
(and are more interpretable!)
26. Dealing with heterogeneous variances:
Huber-White Sandwich Estimator
• Homogeneity of variance (ANOVA) and homoscedasticity (regression) are
important assumptions. One common correction is to use a “robust”
estimate of standard errors that is not much affected by this violation.
• There are at least 5 different algorithms in use. GLiM in SPSS uses the
Huber-White Sandwich Estimator (HC0).
• Unfortunately, this is the most biased of the 5 available algorithms (though
still better than doing nothing). In particular, it has a higher Type I error
rate when samples are small. Hopefully updates will fix this.
Hayes, A. F., & Cai, L. (2007). Using heteroskedasticity-consistent standard error estimators in OLS regression:
An introduction and software implementation. Behavior research methods, 39(4), 709-722.
27. Did I pick the right distribution / link function?
Screening Residuals
• How can you tell if you selected the right distribution and link
function?
• If the residuals of the analysis are normally distributed, then you’ve
corrected the problem and can proceed to interpretation.
• Residuals can be calculated in numerous ways. Generally, better to
save the “Deviance Residuals” as these are favored at the moment.
Will return to this in the worked-out example
28. Did I pick the right distribution / link function?
Model Comparison
To compare different models with the same variables (e.g., normal vs. gamma distribution)
Calculate BIC in both models. The model with lowest BIC is the best fitting.
Changes of 6 or more would usually be substantive
(Something called a “log-likelihood ratio test” is also possible if you want a p-value
But for this course, rely on BIC for less arithmetic)
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological
methodology, 25, 111-164.
Will return to
this in the lab
portion
29. Calculating Goodness of Fit (R2)
• In regression, R2 = SSM / SST
• An analoguous “pseudo- R2” value in GLiM would be:
• McFadden’s Pseudo R2”
Run a model with just the intercept, get the log-likelihood
Run a model with your predictors
Use the equation to the left, and you’ll get pseudo R2
(i.e., 1 – (log likelihood full / log likelihood intercept))
30. A Worked-Out Example in SPSS
• Does Vehicle Age affect the cost of insurance claims?
• Predictor: Vehicle Age (4 groups)
• 0-3 Years Old
• 4-7 Years Old
• 8-9 Years Old
• 10+ Years Old
• Outcome: Number of insurance claims
• I have reason to believe that a negative binomial distribution would be suitable
(i.e., these are rare count events)
Taken from SPSS example data: car_insurance_claims.sav
31. Exploring the data: Normality
Looking at the data overall, a
negative binomial distribution seems
appropriate.
Note though, I should also have
reason to believe that this is also
what the distribution looks like in the
population!
(Probably true if using a sufficiently
large random sample)
32. Exploring the Data: Homogeneity of Variance
Looking at box plots split by group, it
seems pretty clear that there are unequal
variances in the groups.
I could do a formal test, like a Levene’s
test, but these boxplots are enough
justification to use robust SEs (Huber-
White Sandwich Estimator)
Robust SEs will generally be my default
choice in an analysis, regardless.
33. For type of model, we will select
“negative binomial with a log
link”
If you wanted something not on
this list (e.g., gamma with an
inverse link), click on “custom”.
35. Put all your categorical variables under
“Factors”.
Put all your continuous variables (i.e,.
Covariates) under covariates
You can have more than one of each.
If you’re interested in interactions, that
is in the next step
36. Here is where you build your model.
At the bottom, make sure “include
intercept in model is selected. If that is
the ONLY thing selected, you will be
running the “baseline” or “intercept
only” model which you could use for the
Pseudo R2.
In this example, we just drag our single
predictor over to “model” to get the
effect of vehicle age.
With multiple predictors, you can add
main effects and/or interaction effects,
as you please.
37. Usually you can leave most of this,
except to click on “robust estimator.”
Clicking this enables the Huber White
Sandwich estimator option, which is for
violations of the homogeneity of
variance issue.
The only time you would usually fuss
with the other options is if your model
“fails to converge.” In other words, at
the last iteration, the log likelihoods
were still changing, so maybe you would
increase the # of iterations.
This could be because the model is very
complex, bad variables in the model, or
lots of other reasons. But in general, if
the model doesn’t converge, do not
interpret output!
38. There are various elements a person might
want to change in how statistics are
calculated, or what output comes out.
At a beginner level though, the defaults are
usually just fine for what you’ll need.
39. Here is where you get your means,
post-hoc tests, and planned contrasts.
“Compute means for response” back-
transforms into the original unit of
measurement, and is usually best.
“Pairwise” contrasts are every
possible comparison. Other sorts of
planned contrasts are possible.
You can adjust for the multiple
comparisons. LSD is no adjustment for
familywise error rate.
For post-hoc tests, I usually prefer
sequential Bonferroni method to
control alpha rates out of the
available options.
40. Saving the deviance residuals will save
the residuals of the analysis as a new
variable.
This will be useful to explore to ensure
that the residuals are (roughly)
normally distributed.
41. Screening the residuals, it looks like they are normally distributed!
Skewness = 0.01, Kurtosis = -0.50
The negative binomial distribution was likely a good choice
42. Annotated Output
• These are like your overall F-tests (except
that in GLiM, they are chi-square
distributed, not F-tests)
• The first test is a test of whether your
overall model with predictors is better than
an intercept only model
• The second will break down effects
separately (e.g., if there were two
predictors, you’d have predictor 1, predictor
2, and the intercept).
• There’s a significant effect of vehicle age!
43. Annotated Output
These are the parameter estimates from the linear regression equation
So, intercept + the slopes for 3 dummy coded variables here
They are more useful when you have continuous variables predicting other continuous variables
In ANOVA designs, you’ll usually find the “estimated marginal means” of more utility
44. Annotated Output
• These are the means, SEs and
confidence intervals! Report
these in most papers.
• Useful to generate plots with
error bars.
• The means will be the same
as in a regular ANOVA … but
the SEs (and thus confidence
intervals) are different.
45. Annotated Output
These are all the possible pairwise comparisons (with a sequential Bonferroni correction)
Basically the equivalent of post hoc tests in ANOVA
All but one
comparison (0-3
years vs. 4-7 years)
is statistically
significant here.
46. Graphing Results
Boxplots are a useful general way to plot non-parametric data
Easily done in SPSS
You can also do bar/line charts w. error bars
Using SPSS output, can graph in Excel
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
180.00
200.00
1 0-3 2 4-7 3 8-9 4 10+
47. Conclusion
The number of insurance claims differs depending on how old the car is.
Specifically, Ms and SDs show that, as a vehicle ages, fewer insurance
claims are made.
Post-hoc tests using a sequential Bonferroni method show that there is
not much decrease in claims from 0-7 years, but that claims tend to
decrease dramatically as the car gets older than that.
Notas do Editor
The negative binomial model is a hybrid of the Poisson and Gamma distributions. The Poisson parameter is itself a random variable, distributed according to a Gamma distribution. Thus the negative binomial distribution is sometimes known as a Poisson-Gamma mixture.