Presentation given for the Boston Predictive Analytics group by Daniel Gerlanc of Enplus Advisors Inc on July 25, 2012 at the CIC. Visit us at enplusadvisors.com
Intermediate Regression Topics including variable transformations and simulation for constructing confidence intervals.
6. Lattice Plots
> xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".",
subset=volume < 0.2,
panel=function(x, y, ...) {
panel.lmline(x, y, ...)
panel.xyplot(x, y, ...)
},
ylab="rings")
ggplot2 is a newer package that can be used to create similar plots.
9. Simple Model
> fit.1 <- lm(rings ~ sex + shell.wt, abalone)
> summary(fit.1)
Call:
lm(formula = rings ~ sex + shell.wt, data = abalone)
Residuals:
Min 1Q Median 3Q Max
-5.750 -1.592 -0.535 0.886 15.736
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.2423 0.0799 78.08 <2e-16 ***
sex 0.9142 0.0984 9.29 <2e-16 ***
shell.wt 12.8581 0.3300 38.96 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.5 on 4174 degrees of freedom
10. Centering with z-scores
Subtract the mean from each input and
divide by 1 or 2 standard deviations
Dummy/Proxy variables may be centered as
well
11. Center Values
> abalone.adj <- abalone[, c(outcome, predictors)]
for (i in predictors) {
abalone.adj[[i]] <-
(abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]]))
}
Also look into the ‘scale’ function