2. Why Regression
1. Descriptive - form the strength of the
association between outcome and
factors of interest – also done by
Correlations
2. Adjustment – for
covariates/confounders
3. Predictors - to determine important
risk factors affecting the outcome
4. Prediction - to quantify new cases
3. What is linear regression
• It is prediction of interval scale
outcome variable based on predictor
variable/s.
• Dur Anaes = a + b*Dur Sx
Intercept = a
Residual
4. Univariate Linear Reg in SPSS
• The predictor and outcome variables should be in
separate columns.
• Go to Analyze Regression Linear
• Fill in the Dep and Indep
5. • Go to Statistics tab Click on the
options displayed in picture.
6. • On Plots tab, put ZPRED and ZRES in
axis and click Histogram and Normality
plot.
7. • If you want a list of residuals for further
plotting, click the following on Save tab.
8. Results
Same as bivariate Correl
Quantification of explained variability by model
Test of Sample independence –
should be around 2. Ranges
from 0 – 4.
9. P-value for model fit.
Or how well the
model explains the
outcome variable
P-value for coeff.Reg Coeff – Slope = b
Constant = Intercept = a
Tells us that case 18 is an
outlier with standardized
residual more than ± 3
10. Part of case-wise diagnostics. Look at Std Residual – Its min and max should
be between ±3
Normality of std Res. Independence of std Res. From pred value
Outlier
11. Columns of calculated Pred value,
std pred value, residuals and std
residuals.
Can be used to conduct formal
statistical test of normality on
residuals.
12. Assumptions
1. No outliers (std res. Btw -3 to +3) – By seeing
casewise diagnostics, Plot of std res.
2. The data points must be independent – By Durbin
Watson test – DW should be around 2
3. The distribution of these residuals should be
normal (by seeing histogram of std res/res and
formal statistical test of normality on residuals)
and with a constant variance (by visually observing
any relationship btw std pred and std res on
scatter plot – points should be randomly scattered
– there should be no relationship between residual
and predicted values)
13. Interpreting Coefficients
• Unstandardized coeff: Tells us the change in absolute value of
dependent variable with unit change in independent variable -
Slope
• Standardized coeff: Tell us how many standard deviations a
dependent variable will change, per standard deviation increase
in the predictor variable – Useful for comparing magnitude of
effect of independent variables when they are in different
units/different scales – Useful in multivariate regression.
• Coeff of Dichotomous predictors: Variable coding is important.
Either 1-0 or 1-2 for dichotomous predictors – Coeff provides
the change in dependent with one level of predictor compared to
other level.
• Coeff of Nominal Predictors: Variable should be dummy coded
and then entered into model and interpreted as for dichotomous.
• Coeff of Ordinal predictors: Very controversial – either treat it
as semicontinuous or treat it as nominal.
15. Multiple linear Regression
• Provides coeff of predictors independent of influence
of other predictors.
• Multicollinearity: Intercorrelations between
predictor variables can lead to weird coefficients and
P-values of coeff. How to diagnose:
1) Bivariate correlations before regression.
2) Severely reduced R-Sq.
3) Collinearity diagnostics check on Statistics tab in
linear reg.
16. • How to remove multi-collinearity and
make model stable –
1) Either combine correlated variables in
meaningful ways.
2) Remove them one by one to see which
makes the tolerance values closer to 1.
17. Model Selection
• Mainly four:
1) Enter – en masse entry of all variables – best if you know what
you are looking for.
2) Forward – Variables are entered one by one based on
significant coeff. statistics. – P-value for entry can be set –
best if you don’t know what you are looking for.
3) Backward – Variables are entered en masse and removed one
by one based on threshold p-value for removal.
4) Stepwise/Remove – combination of forward and backward.