A brief introduction on how to conduct growth curve statistical analyses using SPSS software, including some sample syntax. Originally presented at IWK Statistics Seminar Series at the IWK Health Center, Halifax, NS, May 1, 2013.
2. When to use a growth curve
Growth curves measure patterns of change over time
Specifically, mean-level changes over time
Patterns can be linear, quadratic, cubic, etc.
Time 1 Time 2 Time 3
John 10 7 5
Mary 8 5 4
Zoe 7 9 9
Sarah 5 2 1
Bill 2 4 3
MEAN 6.4 5.4 4.4
Mean-Level Change**
3. Limitations of RM-ANOVA
Requires a balanced design (i.e., no missing data)
Requires equal spacing between time points
Requires independence of observations (not often
possible in longitudinal data)
Requires homogeneity of variance
4. Growth Curves overcome these limitations
Accounts for missing data using a full information
maximum likelihood (FIML) approach
Does not require equal spacing between time points
(can specify unequal time points, e.g., 1, 2, 5, 7, 10)
Does not require independence of observations (can
model different types of correlated error structures)
Is robust to violations of homogeneity of variance
assumptions required by RM-ANOVA
5. So… what are growth curves?
Growth curves are a type of mixed (or multilevel)
model
Simply put, multilevel models are a way of dealing
with clustered data
For example…
7. Growth Curves are Multilevel Models
All multilevel models (MLMs) partition variance into
their appropriate levels
E.g., students nested within schools
Multilevel models also use maximum likelihood
estimation, which is better when there’s missing data
and are more flexible when dealing with real data
Growth curves are a specific type of MLM where:
The lowest level of observation is repeated measures
The predictor variable is TIME
8. Application to a clinical context
The RCT is a
common
design
Growth curves
can be used
instead of
ANOVA
The time*interv
interaction is
most important
Leiter et al., 2012
9. How do you do this in SPSS?
First, you need to convert your data from “WIDE”
format to “LONG” format
Wide Format
10. Long Format
(Use the syntax provided in the handout to get this):
Long Format
11. Coding the Time Variable is Important
The choices you make for your time variable will
influence your analyses!
If relationships are linear, need to be equidistant
1, 2, 3 OR -1, 0, 1, etc.
If you are expecting a quadratic relationship, need to also
calculate time-squared
1, 4, 9 OR 1, 0, 1
Unequal time points
1 month, 3 month, 12 month
1, 3, 12
12. Decision 1: ML vs REML
Maximum Likelihood Estimation (ML)
vs
Restricted Maximum Likelihood Estimation (REML)
REML is generally preferred because it provides
more unbiased estimates
ML would be preferred if you need to compare
nested models, as REML is not adequate for this
13. Decision 2: Fixed vs Random
Random vs. Fixed Slopes & Intercepts
Random (varying): Allow to vary across people
Fixed (constant): Force them to be equal across people
Random vs. Fixed has no single, agreed-upon
definition (Gelman, 2005); I’m presenting a practical
conceptualization
Fixed (constant) intercepts and slopes are more
parsimonious and less computationally intensive, but
may not be as good a fit to the data. Select the most
parsimonious model that fits the data best.
14. Random (varying) Intercepts
Random (varying) Slopes
http://www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdf
15. Random (varying) Intercepts
Fixed (constant) Slopes
http://www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdf
16. Fixed (constant) Intercepts
Random (varying) Slopes
http://www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdf
17. Decision 3: Linear, Quadratic, or Cubic?
If slopes are allowed to be random (varying), then
you need at least:
3 time points for linear
4 time points for quadratic
Add time*time as a predictor
5 time points for cubic
Add time*time and time*time*time as predictors
One less time point needed if using fixed slopes
Today, I’m focusing on LINEAR relationships
18. Decision 4: Covariance Structure
Is there a predictable pattern to the errors?
If you are unsure, specify an “unstructured” matrix
Less parsimony because it lets things freely vary
AR(1) correlated error structure is also fairly common
Autoregressive correlated errors, getting smaller as
timepoints get more distant
You can test multiple models with different plausible
structures, and choose the one that fits the data best
19. Annotated Syntax
MIXED ASItotal WITH time interv
/METHOD = REML
/FIXED = time interv time*interv | SSTYPE(3)
/RANDOM = INTERCEPT time interv |
SUBJECT(id) COVTYPE(UN)
/PRINT = SOLUTION TESTCOV HISTORY.
*Mixed model, dependant variable
predicted by time and intervention
*Restricted Maximum Likelihood
Estimation (usually better than ML)
*Put all predictors after FIXED.
Indicate interactions by Var1*Var2
*The intercept, and the slopes for
time and interv are random. The
slope for the interaction is fixed
because I omitted it from this part.
*”UN” Specifies an unstructured
covariance matrix (other types are
possible, but require thought)
20. Annotated Output: Model Comparison
Use the BIC values to compare nested models (e.g.,
random slopes vs fixed slopes)
Lower absolute values are better (∆BIC > 4)
21. Annotated Output: Covariance Parameters
UN(1,1) = Variance of the Intercept. Significant, so
random intercepts are important to include.
UN(2,2) = Variance of the slope for time. Non-significant,
which suggests that a more parsimonious model with
fixed slopes for time would fit the data better.
22. Annotated Output
Interpret like ANOVA; parameters adjusted for clustering
Time -> Main effect for time (linear, in this case)
Interv -> Main effect for intervention
Time * interv -> 2-way Interaction
Graphing the interaction is usually important to understand
Dummy coding (0, 1) intervention helps a LOT
23. Graphing the interaction
Can graph the interaction
using tools meant for
moderation in linear
regression with this kind of
model
The parameters in the output
are interpreted the same way,
they’re just adjusted so that
you’re accounting for the
clustering due to repeated
measurement and missing
data
http://www.jeremydawson.co.
uk/slopes.htm
24. A few closing points
Other software can implement this (e.g., SAS,
Mplus, HLM)
Non-normal data may be better modeled with
different distributional assumptions (e.g., poisson)
Modeling of covariance structures may be important,
but can be challenging to figure out
Some programs (e.g., Mplus) may use a latent
variable approach
25. Questions? Comments?
Thank you!
P.S. In the handout I provided, there is some syntax
and instructions which may be helpful!
Email me if you want an electronic copy of the
presentation:
mackinnon.sean@dal.ca
26. Appendix: Syntax
*Convert data from LONG to WIDE format
SORT CASES BY id time.
CASESTOVARS
/ID=id
/INDEX=time
/GROUPBY=VARIABLE.
*Convert data from WIDE to LONG format
VARSTOCASES
/MAKE ASItotal FROM ASItotal.0 ASItotal.1 ASItotal.2
/INDEX=time(3)
/KEEP=id interv
/NULL=KEEP.
27. Appendix: Syntax
*Linear Growth Curve with Intervention Group as
Moderator (Random Intercept, Random Slopes)
MIXED ASItotal WITH time interv
/METHOD = REML
/FIXED = time interv time*interv | SSTYPE(3)
/RANDOM = INTERCEPT time interv time*interv |
SUBJECT(id) COVTYPE(UN)
/PRINT = SOLUTION TESTCOV HISTORY.
28. Appendix: Syntax
*Linear Growth Curve with Intervention Group as
Moderator (Random Intercept, Fixed Slopes)
MIXED ASItotal WITH time interv
/METHOD = REML
/FIXED = time interv time*interv | SSTYPE(3)
/RANDOM = INTERCEPT | SUBJECT(id)
COVTYPE(UN)
/PRINT = SOLUTION TESTCOV HISTORY.
29. Appendix: Syntax
*Linear Growth Curve with Intervention Group as
Moderator (Fixed Intercept, Random Slopes)
MIXED ASItotal WITH time interv
/METHOD = REML
/FIXED = time interv time*interv | SSTYPE(3)
/RANDOM = time interv time*interv | SUBJECT(id)
COVTYPE(UN)
/PRINT = SOLUTION TESTCOV HISTORY.
30. Appendix: Syntax
*Quadratic Growth Curve with Intervention Group as
Moderator (Random Intercept, Fixed Slopes)
COMPUTE quadtime = time*time.
EXECUTE.
MIXED ASItotal WITH time interv
/METHOD = REML
/FIXED = time quadtime interv time*interv quadtime*interv |
SSTYPE(3)
/RANDOM = INTERCEPT | SUBJECT(id) COVTYPE(UN)
/PRINT = SOLUTION TESTCOV HISTORY.