SlideShare uma empresa Scribd logo
1 de 14
DATA 503 – Applied Regression Analysis
Lecture 9: Linear Model, Inference, and Prediction Highlights
By Dr. Ellie Small
OverviewTopics:
• Initial Data Analysis
• Linear Model
• Identifiability and Orthogonality
• Compare Two Models
• Hypothesis Tests for Parameters
• Permutation Tests
• Confidence Intervals and Regions
• Bootstrap Confidence Intervals
• Predictions
2
Initial Data Analysis
We first check the data for errors (often data entry errors):
• summary(data) in R. Look for:
₋ Unreasonable range – Change the minimum/maximum values
₋ Look for coding of missing values – Set them to NA
₋ Look for variables that should have been designated as factors (few distinct values) - factor(var) in R
• Check graphs for unusual behavior/effects:
₋ Histogram for single variable – hist(var)
₋ Plot the density of a single variable - plot(density(var))
₋ Scatterplot for 2 variables – plot(var1~var2)
₋ Grouped boxplot for 2 variables where var2 is a factor – plot(var1~var2)
3
Linear Model
𝑌 = 𝒙′
𝜷 + 𝜀 𝑓𝑜𝑟 1 𝑐𝑎𝑠𝑒 , 𝒙 =
1
𝑥2
⋮
𝑥 𝑝
∈ ℝ 𝑝
𝑡ℎ𝑒 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒐𝒓𝒔
𝒀 = X𝜷 + 𝜺 (𝑓𝑜𝑟 𝑎 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑛)
Where X
nxp
=
𝒙1
′
⋮
𝒙 𝑛
′
=
1 𝑥12 ⋯ 𝑥1𝑝
⋮ ⋮ ⋱ ⋮
1 𝑥 𝑛2 ⋯ 𝑥 𝑛𝑝
= 𝟏 𝑛 𝒙 2 ⋯ 𝒙(𝑝)
𝒙𝑖 is the ith set of predictors (for case i), while 𝒙 𝑗 contains all values for the jth predictor (or variable).
𝒀 =
𝑌1
⋮
𝑌𝑛
, 𝜺 =
𝜀1
⋮
𝜀 𝑛
, and 𝜷 =
𝛽1
⋮
𝛽 𝑝
Assumptions: 𝐸 𝜺 = 𝟎, 𝑉𝑎𝑟 𝜺 = 𝜎2 𝐼 𝑛
4
Response vector
Error vector
Parameter vector
Model matrix
Linear Model - 2
Estimation:
𝒀 = X 𝜷 + 𝜺
𝒀 = 𝒀 + 𝜺
, where 𝒀 = X 𝜷 = 𝑃X 𝒀, RSS = 𝜺 2
, 𝜎2
=
𝑅𝑆𝑆
𝑛−𝑝
, 𝑑𝑓 = 𝑛 − 𝑝
This is the least squares estimate and minimizes the RSS compared to all other linear
combinations of the column vectors of X.
𝑆𝑆𝑡𝑐 is the RSS for the model without predictors. 𝑅2
is the proportion of variance explained
by the model, or the improvement of the model compared to the model without predictors:
𝑅2
=
𝑆𝑆𝑡𝑐−𝑅𝑆𝑆
𝑆𝑆𝑡𝑐
.
Normal Equations: X
′
X 𝜷 = X
′
𝒀 . If 𝑟𝑎𝑛𝑘 X
nxp
= 𝑝, then 𝜷 = X
′
X
−1
X
′
𝒀
5
Regression coefficient vector
Residual vector
Fitted value vector
Sum of Squares Total Corrected
Linear Model - 3
In R:
lmod=lm(response~vars,data=data)
𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information
𝒀=fitted(lmod)
𝜺=residuals(lmod)
RSS=deviance(lmod)
𝜷=coef(lmod)
df=n-p=df.residual(lmod)
6
𝑟𝑎𝑛𝑘 X = 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘
𝜎 = 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑠𝑖𝑔𝑚𝑎
𝑅2
= 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑟. 𝑠𝑞𝑢𝑎𝑟𝑒𝑑
Identifiability and Orthogonality
7
Identifiability
Normal Equations: X
′
X 𝜷 = X
′
𝒀
if 𝑟𝑎𝑛𝑘 X ≠ 𝑝, then at least one of the variables (columns in X) is a linear combination of the others. This means that X
′
X is
not invertible and there are many solutions for 𝜷 in the system of linear equations given by the normal equations.
𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information.
Check to see if any of the 𝛽𝑖 are set (by R) to NA, or if 𝑟𝑎𝑛𝑘 X ≠ 𝑝 (via 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘) in which case one or more of the
variables (columns in X) are a linear combination of the others. Check the relationships and remove the appropriate
variable(s).
Orthogonality
If the columns of the model matrix (the variables) are orthogonal, then any model with a subset of those variables will have
the same estimates for the parameters of those variables, i.e. their regression coefficients 𝛽𝑖 are equal between the models.
Note, however, that the estimate of the error variance will be different between the models, which will affect the CI of each
𝛽𝑖.
Compare Two Models
8
Assume we have normality, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2
𝐼 .
Model Ω: 𝒀 = X
nxp
𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = X 𝜔
nxq
𝜷 𝜔 + 𝜺 𝜔 where X 𝜔
∈ 𝑀 X
𝐻0: 𝒀 = X 𝜔
𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺
If X 𝜔
contains the first q columns of X, then this is equivalent to:
𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 =
𝛽 𝑞+1
⋮
𝛽 𝑝
If 𝐻0 holds, i.e. there is no relationship between 𝜷 𝑟 and 𝒀, then the difference between 𝑅𝑆𝑆 𝜔 and 𝑅𝑆𝑆 is random
(note that 𝑅𝑆𝑆 𝜔 ≥ 𝑅𝑆𝑆 ), and 𝐹 =
𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞
𝑅𝑆𝑆/ 𝑛−𝑝
follows an 𝐹𝑝−𝑞,𝑛−𝑝 distribution. If 𝐹 is much larger than
expected, then that is evidence against 𝐻0 (note that 𝑑𝑓 = 𝑛 − 𝑝 and 𝑑𝑓𝜔 = 𝑛 − 𝑞).
We reject 𝐻0 if 𝑃 𝐹𝑝−𝑞,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise.
In R: lmod=lm(response~vars,data=data); lmodo=lm(response~varso,data=data); anova(lmodo, lmod)
p-value
Compare Two Models - 2
9
Special Case (also under normality):
Model Ω: 𝒀 = X
nxp
𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = 𝟏𝛽 𝜔 + 𝜺 𝜔, i.e. no predictors.
𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 =
𝛽2
⋮
𝛽 𝑝
For this case we have 𝑅𝑆𝑆 𝜔 = 𝑆𝑆𝑡𝑐. We define 𝑆𝑆 𝑟𝑒𝑔 = 𝑆𝑆𝑡𝑐 − RSS, and so we have
𝐹 =
𝑆𝑆 𝑟𝑒𝑔/ 𝑝−1
𝑅𝑆𝑆/ 𝑛−𝑝
, which follows an 𝐹𝑝−1,𝑛−𝑝 distribution under 𝐻0. If 𝐹 is much larger than
expected, then that is evidence against 𝐻0.
We reject 𝐻0 if 𝑃 𝐹𝑝−1,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise.
In R: lmod=lm(response~vars,data=data)
R will perform this special case automatically when you run a linear model; both the F-score and the
p-value are displayed at the bottom of the summary output obtained via 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod .
p-value
Hypothesis Tests for Parameters
10
Under Normality:
𝐻0: 𝛽𝑖 = 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺
We can perform a t-test for this case: 𝑡 =
𝛽𝑖−𝑐
𝑠𝑒 𝛽𝑖
, which follows a 𝑡 𝑛−𝑝 distribution under 𝐻0. If 𝑡 is
much larger than expected, then that is evidence against 𝐻0.
𝑠𝑒 𝛽𝑖 is found by taking the square root of the ith diagonal of 𝜎2 X
′
X
−1
. In R, it is found next to
the appropriate regression coefficient in the summary of the linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ).
We reject 𝐻0 if 𝑃 𝑡 𝑛−𝑝 < − 𝑡 𝑜𝑟 𝑡 𝑛−𝑝 > 𝑡 < 𝛼, and fail to reject otherwise.
In R: lmod=lm(response~vars,data=data)
Calculate t using the above formula (t=(coef(summary(lmod))[i,1]-c)/coef(summary(lmod))[i,2])),
then 2 ∗ 1 − 𝑝𝑡(𝑎𝑏𝑠 𝑡 , 𝑛 − 𝑝 will give the p-value.
For the special case where 𝑐 = 0, the t-score and the p-value are displayed in the summary of the
linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ) next to 𝑠𝑒 𝛽𝑖 .
p-value
Permutation Tests
11
1) Assuming normality does NOT hold, we want to test two models with X 𝜔
∈ 𝑀 X :
𝐻0: 𝒀 = X 𝜔
𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺
We still calculate 𝐹 =
𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞
𝑅𝑆𝑆/ 𝑛−𝑝
(in R: anova(lmod2,lmod)[2,5]) but 𝐹 doesn’t follow an 𝐹𝑝−𝑞,𝑛−𝑝 distribution under
𝐻0. Instead we find a distribution to compare 𝐹 to.
If 𝜔 is the model without intercepts, randomly permute the responses, run a linear model for each permutation (in R:
update(lmod,sample(y)~.,data)), and calculate the 𝐹 for the permuted model (in R: summary(lmod)$fstat[1]). We do this
many times. The p-value then equals the proportion of permuted 𝐹s that are larger than the original 𝐹.
Otherwise, we can permute the variables not in 𝜔, and calculate the 𝐹-score for the comparison of the two. Do this many
times so we have a distribution of 𝐹-scores. The p-value, once again, equals the proportion of permuted 𝐹s that are larger
than the original 𝐹 (in R: mean(permuted fs>original f)).
2) Assuming normality does NOT hold, we want to test whether one of the parameter values equals 0.
𝐻0: 𝛽𝑖 = 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺
For this case first we calculate the usual 𝑡 =
𝛽𝑖
𝑠𝑒 𝛽𝑖
(in R: coef(summary(lmod))[i,1]/coef(summary(lmod))[i,2]). Then we
permute the values of 𝒙 𝑖 and calculate the 𝑡-score; do this many times to get a distribution for those t-scores. The p-
value equals the proportion of permuted 𝑡s that are larger than the original 𝑡 (in R: mean(permuted ts>original t)).
Confidence Intervals and Regions
12
Confidence Intervals:
If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2
𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, then the confidence interval (CI) for any 𝛽𝑖 is:
𝛽𝑖 ± 𝑡 𝑛−𝑝,
𝛼
2
∙ 𝑠𝑒 𝛽𝑖
(in R: confint(lmod)[i,]).
Confidence Regions:
If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2
𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, the confidence region for 𝛽𝑖 and 𝛽𝑗
simultaneously is an ellipse.
In R: plot(ellipse(lmod,c(i,j)),type="l").
To add the center: points(summary(lmod)$coef[i,], summary(lmod)$coef[j,]).
To add the individual Cis: abline(v=confint(lmod)[i,]); abline(h=confint(lmod)[j,])
Bootstrap Confidence Intervals
13
If normality does NOT hold, we create bootstrap confidence intervals. First we
estimate 𝒀 = X 𝜷 + 𝜺 for the model 𝒀 = X𝜷 + 𝜺 the usual way. Then we create an
error distribution for 𝜷 as follows:
1. Generate 𝜺∗ by sampling with replacement from 𝜺 (in R:
boote=sample(residuals(lmod),rep=T)).
2. Form 𝒀∗
= 𝒀+ 𝜺∗
(in R: bootY= fitted(lmod))+boote).
3. Calculate 𝜷∗
for 𝒀∗
= X𝜷∗
+ 𝜺∗
(in R: bootlmod=update(lmod,bootY~vars.),
where 𝜷∗
, or bootbeta =coef(bootlmod)).
We do this many times until we have a distribution of bootstrap betas. We can
obtain variances, standard errors, and Cis from this distribution (Cis in R:
quantile(bootbetas,c(alpha/2,1-alpha/2))).
Predictions
14
We found an estimated model 𝒀 = X 𝜷 + 𝜺, which for one case with predictors 𝒙 equals:
𝑌 = 𝒙′ 𝜷 + 𝜀
For a new set of predictors 𝒙0 =
1
𝑥02
⋮
𝑥0𝑝
, we can now estimate the response: 𝑌0 = 𝒙0
′
𝜷 .
In R: y0=crossprod(x0,coef(lmod)) or predict(lmod,new=data.frame(t(x0)), where in the latter case the vector
x0 must have the correct variable names.
NOTE: Since 𝑉𝑎𝑟 𝜷 = 𝜎2
X
′
X
−1
, we have 𝑉𝑎𝑟 𝑌0 = 𝜎2
𝒙0
′
X
′
X
−1
𝒙0.
• Prediction Interval (PI) for the prediction of a future observation: 𝑌0 ± 𝑡 𝑛−𝑝,
𝛼
2
∙ 𝜎 1 + 𝒙0
′
X
′
X
−1
𝒙0
(in R: predict(lmod,new=data.frame(t(x0)),interval="prediction"), bear in mind the vector x0 must have the
correct variable names)
• Confidence Interval (CI) for the prediction of a future mean response: 𝑌0 ± 𝑡 𝑛−𝑝,
𝛼
2
∙ 𝜎 𝒙0
′
X
′
X
−1
𝒙0
(in R: predict(lmod,new=data.frame(t(x0)),interval=“confidence"), bear in mind the vector x0 must have the
correct variable names)

Mais conteúdo relacionado

Mais procurados

Differential calculus
Differential calculusDifferential calculus
Differential calculusShubham .
 
Maths ppt partial diffrentian eqn
Maths ppt partial diffrentian eqnMaths ppt partial diffrentian eqn
Maths ppt partial diffrentian eqnDheerendraKumar43
 
Histroy of partial differential equation
Histroy of partial differential equationHistroy of partial differential equation
Histroy of partial differential equationamanullahkakar2
 
ROOTS OF EQUATIONS
ROOTS OF EQUATIONSROOTS OF EQUATIONS
ROOTS OF EQUATIONSKt Silva
 
The False-Position Method
The False-Position MethodThe False-Position Method
The False-Position MethodTayyaba Abbas
 
B02110105012
B02110105012B02110105012
B02110105012theijes
 
Secent method
Secent methodSecent method
Secent methodritu1806
 
Infinite series 8.3
Infinite series 8.3 Infinite series 8.3
Infinite series 8.3 Mohsin Ramay
 
Chapter 3: Linear Systems and Matrices - Part 3/Slides
Chapter 3: Linear Systems and Matrices - Part 3/SlidesChapter 3: Linear Systems and Matrices - Part 3/Slides
Chapter 3: Linear Systems and Matrices - Part 3/SlidesChaimae Baroudi
 
Chapter 3: Linear Systems and Matrices - Part 2/Slides
Chapter 3: Linear Systems and Matrices - Part 2/SlidesChapter 3: Linear Systems and Matrices - Part 2/Slides
Chapter 3: Linear Systems and Matrices - Part 2/SlidesChaimae Baroudi
 
A New Approach on the Log - Convex Orderings and Integral inequalities of the...
A New Approach on the Log - Convex Orderings and Integral inequalities of the...A New Approach on the Log - Convex Orderings and Integral inequalities of the...
A New Approach on the Log - Convex Orderings and Integral inequalities of the...inventionjournals
 
Chapter 3: Linear Systems and Matrices - Part 1/Slides
Chapter 3: Linear Systems and Matrices - Part 1/SlidesChapter 3: Linear Systems and Matrices - Part 1/Slides
Chapter 3: Linear Systems and Matrices - Part 1/SlidesChaimae Baroudi
 
Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05Charlton Inao
 
Roots of equations
Roots of equationsRoots of equations
Roots of equationsMileacre
 

Mais procurados (20)

Differential calculus
Differential calculusDifferential calculus
Differential calculus
 
Maths ppt partial diffrentian eqn
Maths ppt partial diffrentian eqnMaths ppt partial diffrentian eqn
Maths ppt partial diffrentian eqn
 
Histroy of partial differential equation
Histroy of partial differential equationHistroy of partial differential equation
Histroy of partial differential equation
 
Logic DM
Logic DMLogic DM
Logic DM
 
ROOTS OF EQUATIONS
ROOTS OF EQUATIONSROOTS OF EQUATIONS
ROOTS OF EQUATIONS
 
The False-Position Method
The False-Position MethodThe False-Position Method
The False-Position Method
 
Dicrete structure
Dicrete structureDicrete structure
Dicrete structure
 
B02110105012
B02110105012B02110105012
B02110105012
 
Dec 14 - R2
Dec 14 - R2Dec 14 - R2
Dec 14 - R2
 
Secent method
Secent methodSecent method
Secent method
 
OPERATIONS RESEARCH
OPERATIONS RESEARCHOPERATIONS RESEARCH
OPERATIONS RESEARCH
 
Infinite series 8.3
Infinite series 8.3 Infinite series 8.3
Infinite series 8.3
 
Chapter 3: Linear Systems and Matrices - Part 3/Slides
Chapter 3: Linear Systems and Matrices - Part 3/SlidesChapter 3: Linear Systems and Matrices - Part 3/Slides
Chapter 3: Linear Systems and Matrices - Part 3/Slides
 
Chapter 3: Linear Systems and Matrices - Part 2/Slides
Chapter 3: Linear Systems and Matrices - Part 2/SlidesChapter 3: Linear Systems and Matrices - Part 2/Slides
Chapter 3: Linear Systems and Matrices - Part 2/Slides
 
A New Approach on the Log - Convex Orderings and Integral inequalities of the...
A New Approach on the Log - Convex Orderings and Integral inequalities of the...A New Approach on the Log - Convex Orderings and Integral inequalities of the...
A New Approach on the Log - Convex Orderings and Integral inequalities of the...
 
Chapter 3: Linear Systems and Matrices - Part 1/Slides
Chapter 3: Linear Systems and Matrices - Part 1/SlidesChapter 3: Linear Systems and Matrices - Part 1/Slides
Chapter 3: Linear Systems and Matrices - Part 1/Slides
 
Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05
 
Cs419 lec11 bottom-up parsing
Cs419 lec11   bottom-up parsingCs419 lec11   bottom-up parsing
Cs419 lec11 bottom-up parsing
 
Roots of equations
Roots of equationsRoots of equations
Roots of equations
 
Cs419 lec10 left recursion and left factoring
Cs419 lec10   left recursion and left factoringCs419 lec10   left recursion and left factoring
Cs419 lec10 left recursion and left factoring
 

Semelhante a 1. linear model, inference, prediction

DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformAmr E. Mohamed
 
Multinomial Model Simulations
Multinomial Model SimulationsMultinomial Model Simulations
Multinomial Model Simulationstim_hare
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...mathsjournal
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsPrincessNorberte
 
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaSolution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sortijfcstjournal
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
Linear Regression
Linear Regression Linear Regression
Linear Regression Rupak Roy
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation codesharma239172
 
Cramer row inequality
Cramer row inequality Cramer row inequality
Cramer row inequality VashuGupta8
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
Pearson product moment correlation
Pearson product moment correlationPearson product moment correlation
Pearson product moment correlationSharlaine Ruth
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docxbissacr
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docxmccormicknadine86
 

Semelhante a 1. linear model, inference, prediction (20)

2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
 
working with python
working with pythonworking with python
working with python
 
DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-Transform
 
Multinomial Model Simulations
Multinomial Model SimulationsMultinomial Model Simulations
Multinomial Model Simulations
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Correlation
CorrelationCorrelation
Correlation
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaSolution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Linear Regression
Linear Regression Linear Regression
Linear Regression
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation code
 
Cramer row inequality
Cramer row inequality Cramer row inequality
Cramer row inequality
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
Pearson product moment correlation
Pearson product moment correlationPearson product moment correlation
Pearson product moment correlation
 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docx
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docx
 

Último

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 

Último (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

1. linear model, inference, prediction

  • 1. DATA 503 – Applied Regression Analysis Lecture 9: Linear Model, Inference, and Prediction Highlights By Dr. Ellie Small
  • 2. OverviewTopics: • Initial Data Analysis • Linear Model • Identifiability and Orthogonality • Compare Two Models • Hypothesis Tests for Parameters • Permutation Tests • Confidence Intervals and Regions • Bootstrap Confidence Intervals • Predictions 2
  • 3. Initial Data Analysis We first check the data for errors (often data entry errors): • summary(data) in R. Look for: ₋ Unreasonable range – Change the minimum/maximum values ₋ Look for coding of missing values – Set them to NA ₋ Look for variables that should have been designated as factors (few distinct values) - factor(var) in R • Check graphs for unusual behavior/effects: ₋ Histogram for single variable – hist(var) ₋ Plot the density of a single variable - plot(density(var)) ₋ Scatterplot for 2 variables – plot(var1~var2) ₋ Grouped boxplot for 2 variables where var2 is a factor – plot(var1~var2) 3
  • 4. Linear Model 𝑌 = 𝒙′ 𝜷 + 𝜀 𝑓𝑜𝑟 1 𝑐𝑎𝑠𝑒 , 𝒙 = 1 𝑥2 ⋮ 𝑥 𝑝 ∈ ℝ 𝑝 𝑡ℎ𝑒 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒐𝒓𝒔 𝒀 = X𝜷 + 𝜺 (𝑓𝑜𝑟 𝑎 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑛) Where X nxp = 𝒙1 ′ ⋮ 𝒙 𝑛 ′ = 1 𝑥12 ⋯ 𝑥1𝑝 ⋮ ⋮ ⋱ ⋮ 1 𝑥 𝑛2 ⋯ 𝑥 𝑛𝑝 = 𝟏 𝑛 𝒙 2 ⋯ 𝒙(𝑝) 𝒙𝑖 is the ith set of predictors (for case i), while 𝒙 𝑗 contains all values for the jth predictor (or variable). 𝒀 = 𝑌1 ⋮ 𝑌𝑛 , 𝜺 = 𝜀1 ⋮ 𝜀 𝑛 , and 𝜷 = 𝛽1 ⋮ 𝛽 𝑝 Assumptions: 𝐸 𝜺 = 𝟎, 𝑉𝑎𝑟 𝜺 = 𝜎2 𝐼 𝑛 4 Response vector Error vector Parameter vector Model matrix
  • 5. Linear Model - 2 Estimation: 𝒀 = X 𝜷 + 𝜺 𝒀 = 𝒀 + 𝜺 , where 𝒀 = X 𝜷 = 𝑃X 𝒀, RSS = 𝜺 2 , 𝜎2 = 𝑅𝑆𝑆 𝑛−𝑝 , 𝑑𝑓 = 𝑛 − 𝑝 This is the least squares estimate and minimizes the RSS compared to all other linear combinations of the column vectors of X. 𝑆𝑆𝑡𝑐 is the RSS for the model without predictors. 𝑅2 is the proportion of variance explained by the model, or the improvement of the model compared to the model without predictors: 𝑅2 = 𝑆𝑆𝑡𝑐−𝑅𝑆𝑆 𝑆𝑆𝑡𝑐 . Normal Equations: X ′ X 𝜷 = X ′ 𝒀 . If 𝑟𝑎𝑛𝑘 X nxp = 𝑝, then 𝜷 = X ′ X −1 X ′ 𝒀 5 Regression coefficient vector Residual vector Fitted value vector Sum of Squares Total Corrected
  • 6. Linear Model - 3 In R: lmod=lm(response~vars,data=data) 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information 𝒀=fitted(lmod) 𝜺=residuals(lmod) RSS=deviance(lmod) 𝜷=coef(lmod) df=n-p=df.residual(lmod) 6 𝑟𝑎𝑛𝑘 X = 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘 𝜎 = 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑠𝑖𝑔𝑚𝑎 𝑅2 = 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑟. 𝑠𝑞𝑢𝑎𝑟𝑒𝑑
  • 7. Identifiability and Orthogonality 7 Identifiability Normal Equations: X ′ X 𝜷 = X ′ 𝒀 if 𝑟𝑎𝑛𝑘 X ≠ 𝑝, then at least one of the variables (columns in X) is a linear combination of the others. This means that X ′ X is not invertible and there are many solutions for 𝜷 in the system of linear equations given by the normal equations. 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information. Check to see if any of the 𝛽𝑖 are set (by R) to NA, or if 𝑟𝑎𝑛𝑘 X ≠ 𝑝 (via 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘) in which case one or more of the variables (columns in X) are a linear combination of the others. Check the relationships and remove the appropriate variable(s). Orthogonality If the columns of the model matrix (the variables) are orthogonal, then any model with a subset of those variables will have the same estimates for the parameters of those variables, i.e. their regression coefficients 𝛽𝑖 are equal between the models. Note, however, that the estimate of the error variance will be different between the models, which will affect the CI of each 𝛽𝑖.
  • 8. Compare Two Models 8 Assume we have normality, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2 𝐼 . Model Ω: 𝒀 = X nxp 𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = X 𝜔 nxq 𝜷 𝜔 + 𝜺 𝜔 where X 𝜔 ∈ 𝑀 X 𝐻0: 𝒀 = X 𝜔 𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺 If X 𝜔 contains the first q columns of X, then this is equivalent to: 𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 = 𝛽 𝑞+1 ⋮ 𝛽 𝑝 If 𝐻0 holds, i.e. there is no relationship between 𝜷 𝑟 and 𝒀, then the difference between 𝑅𝑆𝑆 𝜔 and 𝑅𝑆𝑆 is random (note that 𝑅𝑆𝑆 𝜔 ≥ 𝑅𝑆𝑆 ), and 𝐹 = 𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞 𝑅𝑆𝑆/ 𝑛−𝑝 follows an 𝐹𝑝−𝑞,𝑛−𝑝 distribution. If 𝐹 is much larger than expected, then that is evidence against 𝐻0 (note that 𝑑𝑓 = 𝑛 − 𝑝 and 𝑑𝑓𝜔 = 𝑛 − 𝑞). We reject 𝐻0 if 𝑃 𝐹𝑝−𝑞,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise. In R: lmod=lm(response~vars,data=data); lmodo=lm(response~varso,data=data); anova(lmodo, lmod) p-value
  • 9. Compare Two Models - 2 9 Special Case (also under normality): Model Ω: 𝒀 = X nxp 𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = 𝟏𝛽 𝜔 + 𝜺 𝜔, i.e. no predictors. 𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 = 𝛽2 ⋮ 𝛽 𝑝 For this case we have 𝑅𝑆𝑆 𝜔 = 𝑆𝑆𝑡𝑐. We define 𝑆𝑆 𝑟𝑒𝑔 = 𝑆𝑆𝑡𝑐 − RSS, and so we have 𝐹 = 𝑆𝑆 𝑟𝑒𝑔/ 𝑝−1 𝑅𝑆𝑆/ 𝑛−𝑝 , which follows an 𝐹𝑝−1,𝑛−𝑝 distribution under 𝐻0. If 𝐹 is much larger than expected, then that is evidence against 𝐻0. We reject 𝐻0 if 𝑃 𝐹𝑝−1,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise. In R: lmod=lm(response~vars,data=data) R will perform this special case automatically when you run a linear model; both the F-score and the p-value are displayed at the bottom of the summary output obtained via 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod . p-value
  • 10. Hypothesis Tests for Parameters 10 Under Normality: 𝐻0: 𝛽𝑖 = 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺 We can perform a t-test for this case: 𝑡 = 𝛽𝑖−𝑐 𝑠𝑒 𝛽𝑖 , which follows a 𝑡 𝑛−𝑝 distribution under 𝐻0. If 𝑡 is much larger than expected, then that is evidence against 𝐻0. 𝑠𝑒 𝛽𝑖 is found by taking the square root of the ith diagonal of 𝜎2 X ′ X −1 . In R, it is found next to the appropriate regression coefficient in the summary of the linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ). We reject 𝐻0 if 𝑃 𝑡 𝑛−𝑝 < − 𝑡 𝑜𝑟 𝑡 𝑛−𝑝 > 𝑡 < 𝛼, and fail to reject otherwise. In R: lmod=lm(response~vars,data=data) Calculate t using the above formula (t=(coef(summary(lmod))[i,1]-c)/coef(summary(lmod))[i,2])), then 2 ∗ 1 − 𝑝𝑡(𝑎𝑏𝑠 𝑡 , 𝑛 − 𝑝 will give the p-value. For the special case where 𝑐 = 0, the t-score and the p-value are displayed in the summary of the linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ) next to 𝑠𝑒 𝛽𝑖 . p-value
  • 11. Permutation Tests 11 1) Assuming normality does NOT hold, we want to test two models with X 𝜔 ∈ 𝑀 X : 𝐻0: 𝒀 = X 𝜔 𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺 We still calculate 𝐹 = 𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞 𝑅𝑆𝑆/ 𝑛−𝑝 (in R: anova(lmod2,lmod)[2,5]) but 𝐹 doesn’t follow an 𝐹𝑝−𝑞,𝑛−𝑝 distribution under 𝐻0. Instead we find a distribution to compare 𝐹 to. If 𝜔 is the model without intercepts, randomly permute the responses, run a linear model for each permutation (in R: update(lmod,sample(y)~.,data)), and calculate the 𝐹 for the permuted model (in R: summary(lmod)$fstat[1]). We do this many times. The p-value then equals the proportion of permuted 𝐹s that are larger than the original 𝐹. Otherwise, we can permute the variables not in 𝜔, and calculate the 𝐹-score for the comparison of the two. Do this many times so we have a distribution of 𝐹-scores. The p-value, once again, equals the proportion of permuted 𝐹s that are larger than the original 𝐹 (in R: mean(permuted fs>original f)). 2) Assuming normality does NOT hold, we want to test whether one of the parameter values equals 0. 𝐻0: 𝛽𝑖 = 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺 For this case first we calculate the usual 𝑡 = 𝛽𝑖 𝑠𝑒 𝛽𝑖 (in R: coef(summary(lmod))[i,1]/coef(summary(lmod))[i,2]). Then we permute the values of 𝒙 𝑖 and calculate the 𝑡-score; do this many times to get a distribution for those t-scores. The p- value equals the proportion of permuted 𝑡s that are larger than the original 𝑡 (in R: mean(permuted ts>original t)).
  • 12. Confidence Intervals and Regions 12 Confidence Intervals: If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2 𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, then the confidence interval (CI) for any 𝛽𝑖 is: 𝛽𝑖 ± 𝑡 𝑛−𝑝, 𝛼 2 ∙ 𝑠𝑒 𝛽𝑖 (in R: confint(lmod)[i,]). Confidence Regions: If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2 𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, the confidence region for 𝛽𝑖 and 𝛽𝑗 simultaneously is an ellipse. In R: plot(ellipse(lmod,c(i,j)),type="l"). To add the center: points(summary(lmod)$coef[i,], summary(lmod)$coef[j,]). To add the individual Cis: abline(v=confint(lmod)[i,]); abline(h=confint(lmod)[j,])
  • 13. Bootstrap Confidence Intervals 13 If normality does NOT hold, we create bootstrap confidence intervals. First we estimate 𝒀 = X 𝜷 + 𝜺 for the model 𝒀 = X𝜷 + 𝜺 the usual way. Then we create an error distribution for 𝜷 as follows: 1. Generate 𝜺∗ by sampling with replacement from 𝜺 (in R: boote=sample(residuals(lmod),rep=T)). 2. Form 𝒀∗ = 𝒀+ 𝜺∗ (in R: bootY= fitted(lmod))+boote). 3. Calculate 𝜷∗ for 𝒀∗ = X𝜷∗ + 𝜺∗ (in R: bootlmod=update(lmod,bootY~vars.), where 𝜷∗ , or bootbeta =coef(bootlmod)). We do this many times until we have a distribution of bootstrap betas. We can obtain variances, standard errors, and Cis from this distribution (Cis in R: quantile(bootbetas,c(alpha/2,1-alpha/2))).
  • 14. Predictions 14 We found an estimated model 𝒀 = X 𝜷 + 𝜺, which for one case with predictors 𝒙 equals: 𝑌 = 𝒙′ 𝜷 + 𝜀 For a new set of predictors 𝒙0 = 1 𝑥02 ⋮ 𝑥0𝑝 , we can now estimate the response: 𝑌0 = 𝒙0 ′ 𝜷 . In R: y0=crossprod(x0,coef(lmod)) or predict(lmod,new=data.frame(t(x0)), where in the latter case the vector x0 must have the correct variable names. NOTE: Since 𝑉𝑎𝑟 𝜷 = 𝜎2 X ′ X −1 , we have 𝑉𝑎𝑟 𝑌0 = 𝜎2 𝒙0 ′ X ′ X −1 𝒙0. • Prediction Interval (PI) for the prediction of a future observation: 𝑌0 ± 𝑡 𝑛−𝑝, 𝛼 2 ∙ 𝜎 1 + 𝒙0 ′ X ′ X −1 𝒙0 (in R: predict(lmod,new=data.frame(t(x0)),interval="prediction"), bear in mind the vector x0 must have the correct variable names) • Confidence Interval (CI) for the prediction of a future mean response: 𝑌0 ± 𝑡 𝑛−𝑝, 𝛼 2 ∙ 𝜎 𝒙0 ′ X ′ X −1 𝒙0 (in R: predict(lmod,new=data.frame(t(x0)),interval=“confidence"), bear in mind the vector x0 must have the correct variable names)