SlideShare uma empresa Scribd logo
1 de 117
Lecture 7
Survey Research & Design in Psychology
James Neill, 2017
Creative Commons Attribution 4.0
Image source:http://commons.wikimedia.org/wiki/File:Vidrarias_de_Laboratorio.jpg
Multiple Linear Regression I
2
Overview
1. Readings
2. Correlation (Review)
3. Simple linear regression
4. Multiple linear regression
5. Summary
6. MLR I Quiz - Practice questions
3
1. Howitt & Cramer (2014):
– Regression: Prediction with precision
[Ch 9] [Textbook/eReserve]
– Multiple regression & multiple correlation
[Ch 32] [Textbook/eReserve]
2. StatSoft (2016). How to find relationship
between variables, multiple regression. StatSoft
Electronic Statistics Handbook. [Online]
3. Tabachnick & Fidell (2013).
Multiple regression
(includes example write-ups) [eReserve]
Readings
Correlation (Review)
Linear relation between
two variables
5
Purposes of
correlational statistics
Explanatory - Regression
e.g., cross-sectional study
(all data collected at same
time)
Predictive - Regression
e.g., longitudinal study
(predictors collected prior
to outcome measures)
6
Linear correlation
● Linear relations between interval or ratio
variables
● Best fitting straight-line on a scatterplot
7
Correlation – Key points
• Covariance = sum of cross-products
(unstandardised)
• Correlation = sum of cross-products
(standardised), ranging from -1 to 1
(sign indicates direction, value indicates size)
• Coefficient of determination (r2
)
indicates % of shared variance
• Correlation does not necessarily
equal causality
8
Correlation is shared variance
Venn diagrams are helpful for depicting
relations between variables.
.32 .68.68
r2
=
Simple linear
regression
Explains and predicts a Dependent Variable
(DV) based on a linear relation with an
Independent Variable (IV)
10
What is simple linear regression?
• An extension of correlation
• Best-fitting straight line for a scatterplot
between two variables:
• predictor (X) – also called an independent
variable (IV)
• outcome (Y) - also called a
dependent variable (DV) or criterion variable
• LR uses an IV to explain/predict a DV
• Help to understand relationships and
possible causal effects of one variable
on another.
11
Least squares criterionThe line of best fit minimises
the total sum of squares of
the vertical deviations for
each case.
a = point at which line of
best fit crosses the Y-axis.
b = slope
of the line of best fit
Least squares criterion
residuals
= vertical (Y) distance
between line of best fit
and each observation
(unexplained variance)
12
Linear Regression - Example:
Cigarettes & coronary heart disease
IV = Cigarette
consumption
DV = Coronary
Heart Disease
IV = Cigarette
consumption
Landwehr & Watkins (1987, cited in Howell, 2004, pp. 216-218)
13
Linear regression - Example:
Cigarettes & coronary heart disease
(Howell, 2004)
Research question:
How fast does CHD mortality rise
with a one unit increase in smoking?
• IV = Av. # of cigs per adult per day
• DV = CHD mortality rate (deaths per
10,000 per year due to CHD)
• Unit of analysis = Country
14
Linear regression - Data:
Cigarettes & coronary heart disease
(Howell, 2004)
15Cigarette Consumption per Adult per Day
12108642
CHDMortalityper10,000 30
20
10
0
Linear regression - Example:
Scatterplot with Line of Best Fit
16
Linear regression equation
(without error)
predicted
values of Y
Y-intercept =
level of Y
when X is 0
b = slope = rate of
predicted ↑/↓ for Y
scores for each unit
increase in X
17
Y = bX + a + e
X = IV values
Y = DV values
a = Y-axis intercept
b = slope of line of best fit
(regression coefficient)
e = error
Linear regression equation
(with error)
18
Linear regression – Example:
Equation
Variables:
• (DV) = predicted rate of CHD mortality
• X (IV) = mean # of cigarettes per adult
per day per country
Regression co-efficients:
• b = rate of ↑/↓ of CHD mortality for each
extra cigarette smoked per day
• a = baseline level of CHD (i.e., CHD
when no cigarettes are smoked)
19
Linear regression – Example:
Explained variance
• r = .71
• r2
= .712
= .51
• p < .05
• Approximately 50% in variability
of incidence of CHD mortality is
associated with variability in
countries' smoking rates.
20
Linear regression – Example:
Test for overall significance
ANOVAb
454.482 1 454.48 19.59 .00a
440.757 19 23.198
895.238 20
Regression
Residual
Total
Sum of
Squares df
Mean
Square F Sig.
Predictors: (Constant), Cigarette Consumption per
Adult per Day
a.
Dependent Variable: CHD Mortality per 10,000b.
● r = .71, r2
= .51, p < .05
21
Linear regression – Example:
Regression coefficients - SPSS
Coefficientsa
2.37 2.941 .80 .43
2.04 .461 .713 4.4 .00
(Constant)
Cigarette
Consumption
per Adult per
Day
B
Std.
Error
Unstandardiz
ed
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: CHD Mortality per 10,000a.
a
b
22
Linear regression - Example:
Making a prediction
● What if we want to predict CHD mortality
when cigarette consumption is 6?
● We predict that 14.61 / 10,000 people in a
country with an average cigarette
consumption of 6 per person will die of
CHD per annum.
61.1437.26*04.2ˆ
37.204.2ˆ
=+=
+=+=
Y
XabXY
23
Linear regression - Example:
Accuracy of prediction - Residual
• Finnish smokers smoke 6
cigarettes/adult/day
• We predict 14.61 deaths /10,000
• But Finland actually has 23
deaths / 10,000
• Therefore, the error (“residual”)
for this case is 23 - 14.61 = 8.39
24Cigarette Consumption per Adult per Day
12108642
CHDMortalityper10,000
30
20
10
0
Residual
Prediction
25
Hypothesis testing
Null hypotheses (H0
):
• a (Y-intercept) = 0
Unless the DV is ratio (meaningful 0), we are not
usually very interested in the a value (starting
value of Y when X is 0).
• b (slope of line of best fit) = 0
26
Linear regression – Example:
Testing slope and intercept
Coefficientsa
2.37 2.941 .80 .43
2.04 .461 .713 4.4 .00
(Constant)
Cigarette
Consumption
per Adult per
Day
B
Std.
Error
Unstandardiz
ed
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: CHD Mortality per 10,000a.
a
b
a is not significant -
baseline CHD may be
neglible.
b is significant (+ve) -
smoking is +vely
associated with CHD
27
Linear regression - Example
Does a tendency to
‘ignore problems’ (IV)
predict
‘psychological distress’ (DV)?
28
Ignore the Problem
543210
PsychologicalDistress
140
120
100
80
60
40
20 Rsq = 0.1058
Line of best fit
seeks to minimise
sum of squared
residuals
PD is
measured
in the
direction of
mental
health – i.e.,
high scores
mean less
distress.
Higher IP scores indicate
greater frequency of ignoring
problems as a way of coping.
29
Model Summary
.325a .106 .102 19.4851
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), IGNO2 ACS Time 2 - 11. Ignorea.
Ignoring Problems accounts for ~10% of the
variation in Psychological Distress
Linear regression - Example
R = .32, R2
= .11, Adjusted R2
= .10
The predictor (Ignore the Problem) explains
approximately 10% of the variance in the
dependent variable (Psychological Distress).
30
ANOVAb
9789.888 1 9789.888 25.785 .000a
82767.884 218 379.669
92557.772 219
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), IGNO2 ACS Time 2 - 11. Ignorea.
Dependent Variable: GWB2NEGb.
The population relationship between Ignoring
Problems and Psychological Distress is
unlikely to be 0% because p = .000
(i.e., reject the null hypothesis that there is no
relationship)
Linear regression - Example
31
Coefficientsa
118.897 4.351 27.327 .000
-9.505 1.872 -.325 -5.078 .000
(Constant)
IGNO2 ACS Time
2 - 11. Ignore
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardi
zed
Coefficien
ts
t Sig.
Dependent Variable: GWB2NEGa.
PD = 119 - 9.5*IP
There is a sig. a or constant (Y-intercept) - this
is the baseline level of Psychological Distress.
In addition, Ignore Problems (IP) is a
significant predictor of Psychological Distress
(PD).
Linear regression - Example
32
Ignore the Problem
543210
PsychologicalDistress
140
120
100
80
60
40
20 Rsq = 0.1058
a = 119
b = -9.5
PD = 119 - 9.5*IP
e =
error
33
Linear regression summary
• Linear regression is for
explaining or predicting the
linear relationship between two
variables
• Y = bx + a + e
• = bx + a
(b is the slope; a is the Y-intercept)
Multiple Linear
Regression
Linear relations between two
or more IVs and a single DV
35
Linear Regression
X Y
Multiple Linear Regression
X1
X2
X3 Y
X4
X5
What is multiple linear regression (MLR)?
Visual model
Single predictor
Multiple
predictors
36
What is MLR?
• Use of several IVs to predict a DV
• Weights each predictor (IV)
according to the strength of its
linear relationship with the DV
• Makes adjustments for inter-
relationships among predictors
• Provides a measure of overall fit (R)
37
Correlation
Regression
Correlation
Partial correlation
Multiple linear regression
Y
YX
X1
X2
What is MLR?
38
What is MLR?
A 3-way scatterplot can depict the correlational
relationship between 3 variables.
However, it is difficult to graph/visualise 4+-
way relationships via scatterplot.
39
General steps
1. Develop a diagrammatic model
and express a research
question and/or hypotheses
2. Check assumptions
3. Choose type of MLR
4. Interpret output
5. Develop a regression equation
(if needed)
40
• ~50% of the variance in CHD
mortality could be explained by
cigarette smoking (using LR)
• Strong effect - but what about the
other 50% (‘unexplained’
variance)?
• What about other predictors?
–e.g., exercise and cholesterol?
LR → MLR example:
Cigarettes & coronary heart disease
41
MLR – Example
Research question 1
How well do these three IVs:
• # of cigarettes / day (IV1)
• exercise (IV2) and
• cholesterol (IV3)
predict
• CHD mortality (DV)?
Cigarettes
Exercise CHD Mortality
Cholesterol
42
MLR – Example
Research question 2
To what extent do personality factors
(IVs) predict annual income (DV)?
Extraversion
Neuroticism Income
Psychoticism
43
MLR - Example
Research question 3
“Does the # of years of formal study
of psychology (IV1) and the no. of
years of experience as a
psychologist (IV2) predict clinical
psychologists’ effectiveness in
treating mental illness (DV)?”
Study
Experience Effectiveness
44
MLR - Example
Your example
Generate your own MLR research
question
(e.g., based on some of the following variables):
• Gender & Age
• Enrolment Type
• Hours
• Stress
• Time management
– Planning
– Procrastination
– Effective actions
• Time perspective
– Past-Negative
– Past-Positive
– Present-Hedonistic
– Present-Fatalistic
– Future-Positive
– Future-Negative
45
Assumptions
• Levels of measurement
• Sample size
• Normality (univariate, bivariate, and multivariate)
• Linearity: Linear relations between IVs & DVs
• Homoscedasticity
• Multicollinearity
– IVs are not overly correlated with one another
(e.g., not over .7)
• Residuals are normally distributed
46
Levels of measurement
• DV = Continuous
(Interval or Ratio)
• IV = Continuous or Dichotomous
(if neither, may need to recode
into a dichotomous variable
or create dummy variables)
47
Dummy coding
• “Dummy coding” converts a more
complex variable into a series of
dichotomous variables
(i.e., 0 or 1)
• Several dummy variables can be
created from a variable with a
higher level of measurement.
48
Dummy coding - Example
• Religion
(1 = Christian; 2 = Muslim; 3 = Atheist)
in this format, can't be an IV in regression
(a linear correlation with a categorical variable doesn't
make sense)
• However, it can be dummy coded into
dichotomous variables:
– Christian (0 = no; 1 = yes)
– Muslim (0 = no; 1 = yes)
– Atheist (0 = no; 1 = yes) (redundant)
• These variables can then be used as IVs.
• More information (Dummy variable (statistics), Wikiversity)
49
Sample size:
Rules of thumb
• Enough data is needed to provide reliable estimates
of the correlations.
• N >= 50 cases and N >= 10 to 20 cases x no. of
IVs, otherwise the estimates of the regression line are probably
unstable and are unlikely to replicate if the study is repeated.
• Green (1991) and Tabachnick & Fidell (2013)
suggest:
– 50 + 8(k) for testing an overall regression model and
– 104 + k when testing individual predictors (where k is the
number of IVs)
– Based on detecting a medium effect size (β >= .20), with
critical α <= .05, with power of 80%.
50
Sample size:
Rules of thumb
Q: Should a researcher conduct an MLR
with 4 predictors with 200 cases?
A: Yes; satisfies all rules of thumb:
• N > 50 cases
• N > 20 cases x 4 = 80 cases
• N > 50 + 8 x 4 = 82 cases
• N > 104 + 4 = 108 cases
51
Dealing with outliers
Extreme cases should be deleted or
modified if they are overly influential.
• Univariate outliers -
detect via initial data screening
(e.g., min. and max.)
• Bivariate outliers -
detect via scatterplots
• Multivariate outliers -
unusual combination of predictors – detect via
Mahalanobis' distance
52
Multivariate outliers
• A case may be within normal range for
each variable individually, but be a
multivariate outlier based on an unusual
combination of responses which unduly
influences multivariate test results.
• e.g., a person who:
–Is 18 years old
–Has 3 children
–Has a post-graduate degree
53
Multivariate outliers
• Identify & check unusual
cases
• Use Mahalanobis' distance or
Cook’s D as a MVO screening
procedure
54
Multivariate outliers
• Mahalanobis' distance (MD)
– Distributed as χ2
with df equal to the number of
predictors (with critical α = .001)
– Cases with a MD greater than the critical value
are multivariate outliers.
• Cook’s D
– Cases with CD values > 1 are multivariate
outliers.
• Use either MD or CD
• Examine cases with extreme MD or CD
scores - if in doubt, remove & re-run.
55
Normality &
homoscedasticity
Normality
• If variables are non-normal,
this will create
heteroscedasticity
Homoscedasticity
• Variance around the
regression line should be
the same throughout the
distribution
• Even spread in residual
plots
56
Multicollinearity
• Multicollinearity – IVs shouldn't be
overly correlated (e.g., over .7) – if so,
consider combining them into a single
variable or removing one.
• Singularity - perfect correlations among
IVs.
• Leads to unstable regression
coefficients.
57
Multicollinearity
Detect via:
 Correlation matrix - are there
large correlations among IVs?
 Tolerance statistics - if < .3 then
exclude that variable.
 Variance Inflation Factor (VIF) –
if > 3, then exclude that variable.
 VIF is the reciprocal of Tolerance
(so use one or the other – not both)
58
Causality
• Like correlation, regression does
not tell us about the causal
relationship between variables.
• In many analyses, the IVs and DVs
could be swapped around –
therefore, it is important to:
–Take a theoretical position
–Acknowledge alternative explanations
59
Multiple correlation coefficient
(R)
• “Big R” (capitalised)
• Equivalent of r, but takes into
account that there are multiple
predictors (IVs)
• Always positive, between 0 and 1
• Interpretation is similar to that for r
(correlation coefficient)
60
Coefficient of determination (R2
)
• “Big R squared”
• Squared multiple correlation
coefficient
• Always include R2
• Indicates the % of variance in
DV explained by combined
effects of the IVs
• Analogous to r2
61
Rule of thumb for
interpretation of R2
• .00 = no linear relationship
• .10 = small (R ~ .3)
• .25 = moderate (R ~ .5)
• .50 = strong (R ~ .7)
• 1.00 = perfect linear relationship
R2
> .30
is “good” in social sciences
62
Adjusted R2
• R2
is explained variance in a sample.
• Adjusted R2
is used for estimating
explained variance in a population.
• Report R2
and adjusted R2
.
• Particularly for small N and where
results are to be generalised, take
more note of adjusted R2
.
63
Multiple linear regression –
Test for overall significance
• Shows if there is a significant
linear relationship between the X
variables taken together and Y
• Examine F and p in the ANOVA
table to determine the likelihood
that the explained variance in Y
could have occurred by chance
64
Regression coefficients
• Y-intercept (a)
• Slopes (b):
–Unstandardised
–Standardised
• Slopes are the weighted loading of
each IV on the DV, adjusted for the
other IVs in the model.
65
Unstandardised
regression coefficients
• B = unstandardised regression
coefficient
• Used for regression equations
• Used for predicting Y scores
• But can’t be compared with other Bs
unless all IVs are measured on the
same scale
66
Standardised
regression coefficients
• Beta (β) = standardised regression
coefficient
• Useful for comparing the relative
strength of predictors
• β = r in LR but this is only true in
MLR when the IVs are uncorrelated.
67
Test for significance:
Independent variables
Indicates the likelihood of a linear
relationship between each IV (Xi)
and Y occurring by chance.
Hypotheses:
H0: βi = 0 (No linear relationship)
H1: βi ≠ 0 (Linear relationship
between Xi and Y)
68
Relative importance of IVs
• Which IVs are the most important?
• To answer this, compare the
standardised regression
coefficients (βs)
69
Y = b1x1 + b2x2 +.....+ bixi + a + e
• Y = observed DV scores
• bi = unstandardised regression
coefficients (the Bs in SPSS) -
slopes
• x1 to xi = IV scores
• a = Y axis intercept
• e = error (residual)
Regression equation
70
Multiple linear regression -
Example
“Does ‘ignoring problems’ (IV1)
and ‘worrying’ (IV2)
predict ‘psychological distress’
(DV)”
71
72
.32 .52
.35
Y
X1
X2
73
Multiple linear regression -
Example
Together, Ignoring Problems and Worrying
explain 30% of the variance in Psychological
Distress in the Australian adolescent
population (R2
= .30, Adjusted R2
= .29).
74
Multiple linear regression -
Example
The explained variance in the population is
unlikely to be 0 (p = .00).
75
Coefficientsa
138.932 4.680 29.687 .000
-11.511 1.510 -.464 -7.625 .000
-4.735 1.780 -.162 -2.660 .008
(Constant)
Worry
Ignore the Problem
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Psychological Distressa.
Multiple linear regression -
Example
Worry predicts about three times as much
variance in Psychological Distress than Ignoring
the Problem, although both are significant,
negative predictors of mental health.
76
Linear Regression
PD (hat) = 119 – 9.50*Ignore
R2
= .11
Multiple Linear Regression
PD (hat) = 139 - .4.7*Ignore - 11.5*Worry
R2
= .30
Multiple linear regression -
Example – Prediction equations
77
Confidence interval for the slope
Mental Health (PD) is reduced by between 8.5 and
14.5 units per increase of Worry units.
Mental Health (PD) is reduced by between 1.2 and
8.2 units per increase in Ignore the Problem units.
78
Multiple linear regression - Example
Effect of violence, stress, social support
on internalising behaviour problems
Kliewer, Lepore, Oskin, & Johnson, (1998)
Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php
79
Multiple linear regression –
Example – Violence study
• Participants were children:
– 8 - 12 years
– Lived in high-violence areas, USA
• Hypotheses:
– Stress → ↑ internalising behaviour
– Violence → ↑ internalising behaviour
– Social support → ↓ internalising behaviour
80
Multiple linear regression –
Example - Variables
• Predictors
–Degree of witnessing violence
–Measure of life stress
–Measure of social support
• Outcome
–Internalising behaviour
(e.g., depression, anxiety, withdrawal
symptoms) – measured using the
Child Behavior Checklist (CBCL)
81
Correlations
Correlations
Pearson Correlation
.050
.080 -.080
.200* .270** -.170
Amount violenced
witnessed
Current stress
Social support
Internalizing symptoms
on CBCL
Amount
violenced
witnessed
Current
stress
Social
support
Internalizin
g
symptoms
on CBCL
Correlation is significant at the 0.05 level (2-tailed).*.
Correlation is significant at the 0.01 level (2-tailed).**.
Correlations
amongst
the IVs
Correlations
between the
IVs and the DV
82
Model Summary
.37a .135 .108 2.2198
R
R
Square
Adjusted
R
Square
Std. Error
of the
Estimate
Predictors: (Constant), Social
support, Current stress, Amount
violenced witnessed
a.
R2
13.5% of the variance in children's internalising
symptoms can be explained by the 3 predictors.
83
Regression coefficients
Coefficientsa
.477 1.289 .37 .712
.038 .018 .201 2.1 .039
.273 .106 .247 2.6 .012
-.074 .043 -.166 -2 .087
(Constant)
Amount
violenced
witnessed
Current stress
Social
support
B
Std.
Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Internalizing symptoms on CBCa.
2 predictors
have
p < .05
84
Regression equation
• A separate coefficient or slope for
each variable
• An intercept (here its called b0
)
477.0074.0273.0038.0
ˆ 0332211
+−+=
+++=
SocSuppStressWit
bXbXbXbY
85
Interpretation
• Slopes for Witness and Stress are +ve;
slope for Social Support is -ve.
• Ignoring Stress and Social Support, a
one unit increase in Witness would
produce .038 unit increase in
Internalising symptoms.
477.0074.0273.0038.0
ˆ 0332211
+−+=
+++=
SocSuppStressWit
bXbXbXbY
86
Predictions
Q: If Witness = 20, Stress = 5, and
SocSupp = 35, what we would
predict internalising symptoms to be?
A: .012
012.
477.0)35(074.)5(273.)20(038.
477.0*074.*273.*038.ˆ
=
+−+=
+−+= SocSuppStressWitY
87
Multiple linear regression - Example
The role of human, social, built, and natural
capital in explaining life satisfaction at the
country level:
Towards a National Well-Being Index (NWI)
Vemuri & Costanza (2006)
88
Variables
• IVs:
–Human & Built Capital
(Human Development Index)
–Natural Capital
(Ecosystem services per km2
)
–Social Capital
(Press Freedom)
• DV = Life satisfaction
• Units of analysis: Countries
(N = 57; mostly developed countries, e.g., in Europe
and America)
89
● There are moderately strong positive and
statistically significant linear relations between
the IVs and the DV
● The IVs have small to moderate positive
inter-correlations.
90
● R2
= .35
● Two sig. IVs (not Social Capital - dropped)
91
6 outliers
92
● R2
= .72
(after dropping 6 outliers)
93
Types of MLR
• Standard or direct (simultaneous)
• Hierarchical or sequential
• Stepwise (forward & backward)
Image source: https://commons.wikimedia.org/wiki/File:IStumbler.png
94
• All predictor variables are entered
together (simultaneously)
• Allows assessment of the relationship
between all predictor variables and the
outcome (Y) variable if there is good
theoretical reason for doing so.
• Manual technique & commonly used.
• If you're not sure what type of MLR to
use, start with this approach.
Direct or Standard
95
• IVs are entered in blocks or stages.
–Researcher defines order of entry for the
variables, based on theory.
–May enter ‘nuisance’ variables first to
‘control’ for them, then test ‘purer’ effect of
next block of important variables.
• R2
change - additional variance in Y
explained at each stage of the regression.
– F test of R2
change.
Hierarchical (Sequential)
96
• Example
– Drug A is a cheap, well-proven drug which reduces
AIDS symptoms
– Drug B is an expensive, experimental drug which
could help to cure AIDS
– Hierarchical linear regression:
• Step 1: Drug A (IV1)
• Step 2: Drug B (IV2)
• DV = AIDS symptoms
• Research question: To what extent does Drug B
reduce AIDS symptoms above and beyond the effect
of Drug A?
• Examine the change in R2
between Step 1 & Step 2
Hierarchical (Sequential)
97
• Computer-driven – controversial.
• Starts with 0 predictors, then the
strongest predictor is entered into
the model, then the next strongest
etc. if they reach a criteria (e.g., p
< .05)
Forward selection
98
• Computer-driven – controversial.
• All predictor variables are
entered, then the weakest
predictors are removed, one by
one, if they meet a criteria (e.g., p
> .05)
Backward elimination
99
• Computer-driven – controversial.
• Combines forward & backward.
• At each step, variables may be
entered or removed if they meet
certain criteria.
• Useful for developing the best
prediction equation from a large
number of variables.
• Redundant predictors are removed.
Stepwise
100
Which method?
• Standard: To assess impact of
all IVs simultaneously
• Hierarchical: To test IVs in a
specific order (based on
hypotheses derived from theory)
• Stepwise: If the goal is accurate
statistical prediction from a large
# of variables - computer driven
101
Summary
102
Summary: General steps
1. Develop model and hypotheses
2. Check assumptions
3. Choose type
4. Interpret output
5. Develop a regression equation
(if needed)
103
Summary: Linear regression
1. Best-fitting straight line for a
scatterplot of two variables
2. Y = bX + a + e
1. Predictor (X; IV)
2. Outcome (Y; DV)
3. Least squares criterion
4. Residuals are the vertical
distance between actual and
predicted values
104
Summary:
MLR assumptions
1. Level of measurement
2. Sample size
3. Normality
4. Linearity
5. Homoscedasticity
6. Collinearity
7. Multivariate outliers
8. Residuals should be normally
distributed
105
Summary:
Level of measurement and
dummy coding
1. Levels of measurement
1. DV = Interval or ratio
2. IV = Interval or ratio or dichotomous
2. Dummy coding
1. Convert complex variables into series of
dichotomous IVs
106
Summary:
MLR types
1. Standard
2. Hierarchical
3. Stepwise / Forward / Backward
107
Summary:
MLR output
1. Overall fit
1. R, R2
, Adjusted R2
2. F, p
2. Coefficients
1. Relation between each IV and the DV,
adjusted for the other IVs
2. B, β, t, p, and rp
3. Regression equation (if useful)
Y = b1x1 + b2x2 +.....+ bixi + a + e
108
Practice quiz
109
MLR I Quiz –
Practice question 1
A linear regression analysis produces the
equation Y = 0.4X + 3. This indicates
that:
(a) When Y = 0.4, X = 3
(b) When Y = 0, X = 3
(c) When X = 3, Y = 0.4
(d) When X = 0, Y = 3
(e) None of the above
110
MLR I Quiz –
Practice question 1
Multiple linear regression is a
________ type of statistical analysis.
(a) univariate
(b) bivariate
(c) multivariate
111
MLR I Quiz –
Practice question 3
Multiple linear regression is a
________ type of statistical analysis.
(a) univariate
(b) bivariate
(c) multivariate
112
MLR I Quiz –
Practice question 4
The following types of data can be used in
MLR (choose all that apply):
(a) Interval or higher DV
(b) Interval or higher IVs
(c) Dichotomous Ivs
(d) All of the above
(e) None of the above
113
MLR I Quiz –
Practice question 5
In MLR, the square of the multiple
correlation coefficient, R2
, is called the:
(a) Coefficient of determination
(b) Variance
(c) Covariance
(d) Cross-product
(e) Big R
114
MLR I Quiz –
Practice question 6
In MLR, a residual is the difference
between the predicted Y and actual Y
values.
(a) True
(b) False
115
Next lecture
• Review of MLR I
• Semi-partial correlations
• Residual analysis
• Interactions
• Analysis of change
116
References
Howell, D. C. (2004). Chapter 9: Regression. In D. C. Howell..
Fundamental statistics for the behavioral sciences (5th ed.) (pp. 203-
235). Belmont, CA: Wadsworth.
Howitt, D. & Cramer, D. (2011). Introduction to statistics in psychology
(5th ed.). Harlow, UK: Pearson.
Kliewer, W., Lepore, S.J., Oskin, D., & Johnson, P.D. (1998). The role of
social and cognitive processes in children’s adjustment to community
violence. Journal of Consulting and Clinical Psychology, 66, 199-209.
Landwehr, J.M. & Watkins, A.E. (1987) Exploring data: Teacher’s
edition. Palo Alto, CA: Dale Seymour Publications.
Tabachnick, B. G., & Fidell, L. S. (2013) (6th ed. - International ed.).
Multiple regression [includes example write-ups]. In Using multivariate
statistics (pp. 117-170). Boston, MA: Allyn and Bacon.
Vemuri, A. W., & Constanza, R. (2006). The role of human, social, built,
and natural capital in explaining life satisfaction at the country level:
Toward a National Well-Being Index (NWI). Ecological Economics,
58(1), 119-133.
117
Open Office Impress
● This presentation was made using
Open Office Impress.
● Free and open source software.
● http://www.openoffice.org/product/impress.html

Mais conteúdo relacionado

Mais procurados

Non Linear Equation
Non Linear EquationNon Linear Equation
Non Linear EquationMdAlAmin187
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSSLNIPE
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysisRabin BK
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentationCarlo Magno
 
Linear regression
Linear regressionLinear regression
Linear regressionTech_MX
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Regression analysis
Regression analysisRegression analysis
Regression analysisRavi shankar
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis_pem
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSSParag Shah
 
Multiple linear regression II
Multiple linear regression IIMultiple linear regression II
Multiple linear regression IIJames Neill
 

Mais procurados (20)

Non Linear Equation
Non Linear EquationNon Linear Equation
Non Linear Equation
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSS
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Time Series
Time SeriesTime Series
Time Series
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple linear regression II
Multiple linear regression IIMultiple linear regression II
Multiple linear regression II
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 

Destaque

Linear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleLinear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleMarjan Sterjev
 
Linear regression(probabilistic interpretation)
Linear regression(probabilistic interpretation)Linear regression(probabilistic interpretation)
Linear regression(probabilistic interpretation)hitesh saini
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Partial and multiple correlation and regression
Partial and multiple correlation and regressionPartial and multiple correlation and regression
Partial and multiple correlation and regressionKritika Jain
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysisnadiazaheer
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tearsAnkit Sharma
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spssDr. Ravneet Kaur
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression AnalysisMinha Hwang
 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSSDr Athar Khan
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regressionTauseef khan
 
Income & expenditure ppt
Income & expenditure pptIncome & expenditure ppt
Income & expenditure pptMsOBrien
 
Eco Basic 1 8
Eco Basic 1 8Eco Basic 1 8
Eco Basic 1 8kit11229
 
Статистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээСтатистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээTuul Tuul
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
 

Destaque (20)

Linear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleLinear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation Example
 
Econometrics chapter 8
Econometrics chapter 8Econometrics chapter 8
Econometrics chapter 8
 
Linear regression(probabilistic interpretation)
Linear regression(probabilistic interpretation)Linear regression(probabilistic interpretation)
Linear regression(probabilistic interpretation)
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Partial and multiple correlation and regression
Partial and multiple correlation and regressionPartial and multiple correlation and regression
Partial and multiple correlation and regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
Chapter 15
Chapter 15 Chapter 15
Chapter 15
 
Linear regression
Linear regression Linear regression
Linear regression
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spss
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression Analysis
 
Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regression
 
Income & expenditure ppt
Income & expenditure pptIncome & expenditure ppt
Income & expenditure ppt
 
Eco Basic 1 8
Eco Basic 1 8Eco Basic 1 8
Eco Basic 1 8
 
Regression
RegressionRegression
Regression
 
Статистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээСтатистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээ
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 

Semelhante a Multiple linear regression

3010l8.pdf
3010l8.pdf3010l8.pdf
3010l8.pdfdawitg2
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
epiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.pptepiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.pptssuser031f35
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencycheweb1
 
Correlation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxCorrelation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxUnfold1
 
Chapter 03 scatterplots and correlation
Chapter 03 scatterplots and correlationChapter 03 scatterplots and correlation
Chapter 03 scatterplots and correlationHamdy F. F. Mahmoud
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsShantanu Deshpande
 
Research methodology and iostatistics ppt
Research methodology and iostatistics pptResearch methodology and iostatistics ppt
Research methodology and iostatistics pptNikhat Mohammadi
 
Logistic regression1
Logistic regression1Logistic regression1
Logistic regression1LoiLe36
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfDikshathawait
 
correlation.pptx
correlation.pptxcorrelation.pptx
correlation.pptxSmHasiv
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVAStephen Senn
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdflisow86669
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI posterFunan Shi
 

Semelhante a Multiple linear regression (20)

Chapter13
Chapter13Chapter13
Chapter13
 
Types of regression ii
Types of regression iiTypes of regression ii
Types of regression ii
 
3010l8.pdf
3010l8.pdf3010l8.pdf
3010l8.pdf
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
epiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.pptepiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.ppt
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
 
Correlation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxCorrelation and Regression Analysis.pptx
Correlation and Regression Analysis.pptx
 
Chapter 03 scatterplots and correlation
Chapter 03 scatterplots and correlationChapter 03 scatterplots and correlation
Chapter 03 scatterplots and correlation
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalytics
 
Research methodology and iostatistics ppt
Research methodology and iostatistics pptResearch methodology and iostatistics ppt
Research methodology and iostatistics ppt
 
Logistic regression1
Logistic regression1Logistic regression1
Logistic regression1
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdf
 
correlation.pptx
correlation.pptxcorrelation.pptx
correlation.pptx
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
Bio statistics 1
Bio statistics 1Bio statistics 1
Bio statistics 1
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI poster
 
Biostatistics in Bioequivalence
Biostatistics in BioequivalenceBiostatistics in Bioequivalence
Biostatistics in Bioequivalence
 

Mais de James Neill

Personal control beliefs
Personal control beliefsPersonal control beliefs
Personal control beliefsJames Neill
 
Goal setting and goal striving
Goal setting and goal strivingGoal setting and goal striving
Goal setting and goal strivingJames Neill
 
Implicit motives
Implicit motivesImplicit motives
Implicit motivesJames Neill
 
Psychological needs
Psychological needsPsychological needs
Psychological needsJames Neill
 
Extrinsic motivation
Extrinsic motivationExtrinsic motivation
Extrinsic motivationJames Neill
 
Physiological needs
Physiological needsPhysiological needs
Physiological needsJames Neill
 
Motivated and emotional brain
Motivated and emotional brainMotivated and emotional brain
Motivated and emotional brainJames Neill
 
Motivation in historical perspective
Motivation in historical perspectiveMotivation in historical perspective
Motivation in historical perspectiveJames Neill
 
Motivation and emotion unit outline
Motivation and emotion unit outlineMotivation and emotion unit outline
Motivation and emotion unit outlineJames Neill
 
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...Development and evaluation of the PCYC Catalyst outdoor adventure interventio...
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...James Neill
 
Individual emotions
Individual emotionsIndividual emotions
Individual emotionsJames Neill
 
How and why to edit wikipedia
How and why to edit wikipediaHow and why to edit wikipedia
How and why to edit wikipediaJames Neill
 
Going open (education): What, why, and how?
Going open (education): What, why, and how?Going open (education): What, why, and how?
Going open (education): What, why, and how?James Neill
 
Introduction to motivation and emotion 2013
Introduction to motivation and emotion 2013Introduction to motivation and emotion 2013
Introduction to motivation and emotion 2013James Neill
 
Summary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologySummary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologyJames Neill
 
The effects of green exercise on stress, anxiety and mood
The effects of green exercise on stress, anxiety and moodThe effects of green exercise on stress, anxiety and mood
The effects of green exercise on stress, anxiety and moodJames Neill
 
Conclusion and review
Conclusion and reviewConclusion and review
Conclusion and reviewJames Neill
 
Visualiation of quantitative information
Visualiation of quantitative informationVisualiation of quantitative information
Visualiation of quantitative informationJames Neill
 

Mais de James Neill (20)

Self
SelfSelf
Self
 
Personal control beliefs
Personal control beliefsPersonal control beliefs
Personal control beliefs
 
Mindsets
MindsetsMindsets
Mindsets
 
Goal setting and goal striving
Goal setting and goal strivingGoal setting and goal striving
Goal setting and goal striving
 
Implicit motives
Implicit motivesImplicit motives
Implicit motives
 
Psychological needs
Psychological needsPsychological needs
Psychological needs
 
Extrinsic motivation
Extrinsic motivationExtrinsic motivation
Extrinsic motivation
 
Physiological needs
Physiological needsPhysiological needs
Physiological needs
 
Motivated and emotional brain
Motivated and emotional brainMotivated and emotional brain
Motivated and emotional brain
 
Motivation in historical perspective
Motivation in historical perspectiveMotivation in historical perspective
Motivation in historical perspective
 
Motivation and emotion unit outline
Motivation and emotion unit outlineMotivation and emotion unit outline
Motivation and emotion unit outline
 
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...Development and evaluation of the PCYC Catalyst outdoor adventure interventio...
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...
 
Individual emotions
Individual emotionsIndividual emotions
Individual emotions
 
How and why to edit wikipedia
How and why to edit wikipediaHow and why to edit wikipedia
How and why to edit wikipedia
 
Going open (education): What, why, and how?
Going open (education): What, why, and how?Going open (education): What, why, and how?
Going open (education): What, why, and how?
 
Introduction to motivation and emotion 2013
Introduction to motivation and emotion 2013Introduction to motivation and emotion 2013
Introduction to motivation and emotion 2013
 
Summary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologySummary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychology
 
The effects of green exercise on stress, anxiety and mood
The effects of green exercise on stress, anxiety and moodThe effects of green exercise on stress, anxiety and mood
The effects of green exercise on stress, anxiety and mood
 
Conclusion and review
Conclusion and reviewConclusion and review
Conclusion and review
 
Visualiation of quantitative information
Visualiation of quantitative informationVisualiation of quantitative information
Visualiation of quantitative information
 

Último

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 

Último (20)

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

Multiple linear regression

  • 1. Lecture 7 Survey Research & Design in Psychology James Neill, 2017 Creative Commons Attribution 4.0 Image source:http://commons.wikimedia.org/wiki/File:Vidrarias_de_Laboratorio.jpg Multiple Linear Regression I
  • 2. 2 Overview 1. Readings 2. Correlation (Review) 3. Simple linear regression 4. Multiple linear regression 5. Summary 6. MLR I Quiz - Practice questions
  • 3. 3 1. Howitt & Cramer (2014): – Regression: Prediction with precision [Ch 9] [Textbook/eReserve] – Multiple regression & multiple correlation [Ch 32] [Textbook/eReserve] 2. StatSoft (2016). How to find relationship between variables, multiple regression. StatSoft Electronic Statistics Handbook. [Online] 3. Tabachnick & Fidell (2013). Multiple regression (includes example write-ups) [eReserve] Readings
  • 4. Correlation (Review) Linear relation between two variables
  • 5. 5 Purposes of correlational statistics Explanatory - Regression e.g., cross-sectional study (all data collected at same time) Predictive - Regression e.g., longitudinal study (predictors collected prior to outcome measures)
  • 6. 6 Linear correlation ● Linear relations between interval or ratio variables ● Best fitting straight-line on a scatterplot
  • 7. 7 Correlation – Key points • Covariance = sum of cross-products (unstandardised) • Correlation = sum of cross-products (standardised), ranging from -1 to 1 (sign indicates direction, value indicates size) • Coefficient of determination (r2 ) indicates % of shared variance • Correlation does not necessarily equal causality
  • 8. 8 Correlation is shared variance Venn diagrams are helpful for depicting relations between variables. .32 .68.68 r2 =
  • 9. Simple linear regression Explains and predicts a Dependent Variable (DV) based on a linear relation with an Independent Variable (IV)
  • 10. 10 What is simple linear regression? • An extension of correlation • Best-fitting straight line for a scatterplot between two variables: • predictor (X) – also called an independent variable (IV) • outcome (Y) - also called a dependent variable (DV) or criterion variable • LR uses an IV to explain/predict a DV • Help to understand relationships and possible causal effects of one variable on another.
  • 11. 11 Least squares criterionThe line of best fit minimises the total sum of squares of the vertical deviations for each case. a = point at which line of best fit crosses the Y-axis. b = slope of the line of best fit Least squares criterion residuals = vertical (Y) distance between line of best fit and each observation (unexplained variance)
  • 12. 12 Linear Regression - Example: Cigarettes & coronary heart disease IV = Cigarette consumption DV = Coronary Heart Disease IV = Cigarette consumption Landwehr & Watkins (1987, cited in Howell, 2004, pp. 216-218)
  • 13. 13 Linear regression - Example: Cigarettes & coronary heart disease (Howell, 2004) Research question: How fast does CHD mortality rise with a one unit increase in smoking? • IV = Av. # of cigs per adult per day • DV = CHD mortality rate (deaths per 10,000 per year due to CHD) • Unit of analysis = Country
  • 14. 14 Linear regression - Data: Cigarettes & coronary heart disease (Howell, 2004)
  • 15. 15Cigarette Consumption per Adult per Day 12108642 CHDMortalityper10,000 30 20 10 0 Linear regression - Example: Scatterplot with Line of Best Fit
  • 16. 16 Linear regression equation (without error) predicted values of Y Y-intercept = level of Y when X is 0 b = slope = rate of predicted ↑/↓ for Y scores for each unit increase in X
  • 17. 17 Y = bX + a + e X = IV values Y = DV values a = Y-axis intercept b = slope of line of best fit (regression coefficient) e = error Linear regression equation (with error)
  • 18. 18 Linear regression – Example: Equation Variables: • (DV) = predicted rate of CHD mortality • X (IV) = mean # of cigarettes per adult per day per country Regression co-efficients: • b = rate of ↑/↓ of CHD mortality for each extra cigarette smoked per day • a = baseline level of CHD (i.e., CHD when no cigarettes are smoked)
  • 19. 19 Linear regression – Example: Explained variance • r = .71 • r2 = .712 = .51 • p < .05 • Approximately 50% in variability of incidence of CHD mortality is associated with variability in countries' smoking rates.
  • 20. 20 Linear regression – Example: Test for overall significance ANOVAb 454.482 1 454.48 19.59 .00a 440.757 19 23.198 895.238 20 Regression Residual Total Sum of Squares df Mean Square F Sig. Predictors: (Constant), Cigarette Consumption per Adult per Day a. Dependent Variable: CHD Mortality per 10,000b. ● r = .71, r2 = .51, p < .05
  • 21. 21 Linear regression – Example: Regression coefficients - SPSS Coefficientsa 2.37 2.941 .80 .43 2.04 .461 .713 4.4 .00 (Constant) Cigarette Consumption per Adult per Day B Std. Error Unstandardiz ed Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: CHD Mortality per 10,000a. a b
  • 22. 22 Linear regression - Example: Making a prediction ● What if we want to predict CHD mortality when cigarette consumption is 6? ● We predict that 14.61 / 10,000 people in a country with an average cigarette consumption of 6 per person will die of CHD per annum. 61.1437.26*04.2ˆ 37.204.2ˆ =+= +=+= Y XabXY
  • 23. 23 Linear regression - Example: Accuracy of prediction - Residual • Finnish smokers smoke 6 cigarettes/adult/day • We predict 14.61 deaths /10,000 • But Finland actually has 23 deaths / 10,000 • Therefore, the error (“residual”) for this case is 23 - 14.61 = 8.39
  • 24. 24Cigarette Consumption per Adult per Day 12108642 CHDMortalityper10,000 30 20 10 0 Residual Prediction
  • 25. 25 Hypothesis testing Null hypotheses (H0 ): • a (Y-intercept) = 0 Unless the DV is ratio (meaningful 0), we are not usually very interested in the a value (starting value of Y when X is 0). • b (slope of line of best fit) = 0
  • 26. 26 Linear regression – Example: Testing slope and intercept Coefficientsa 2.37 2.941 .80 .43 2.04 .461 .713 4.4 .00 (Constant) Cigarette Consumption per Adult per Day B Std. Error Unstandardiz ed Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: CHD Mortality per 10,000a. a b a is not significant - baseline CHD may be neglible. b is significant (+ve) - smoking is +vely associated with CHD
  • 27. 27 Linear regression - Example Does a tendency to ‘ignore problems’ (IV) predict ‘psychological distress’ (DV)?
  • 28. 28 Ignore the Problem 543210 PsychologicalDistress 140 120 100 80 60 40 20 Rsq = 0.1058 Line of best fit seeks to minimise sum of squared residuals PD is measured in the direction of mental health – i.e., high scores mean less distress. Higher IP scores indicate greater frequency of ignoring problems as a way of coping.
  • 29. 29 Model Summary .325a .106 .102 19.4851 Model 1 R R Square Adjusted R Square Std. Error of the Estimate Predictors: (Constant), IGNO2 ACS Time 2 - 11. Ignorea. Ignoring Problems accounts for ~10% of the variation in Psychological Distress Linear regression - Example R = .32, R2 = .11, Adjusted R2 = .10 The predictor (Ignore the Problem) explains approximately 10% of the variance in the dependent variable (Psychological Distress).
  • 30. 30 ANOVAb 9789.888 1 9789.888 25.785 .000a 82767.884 218 379.669 92557.772 219 Regression Residual Total Model 1 Sum of Squares df Mean Square F Sig. Predictors: (Constant), IGNO2 ACS Time 2 - 11. Ignorea. Dependent Variable: GWB2NEGb. The population relationship between Ignoring Problems and Psychological Distress is unlikely to be 0% because p = .000 (i.e., reject the null hypothesis that there is no relationship) Linear regression - Example
  • 31. 31 Coefficientsa 118.897 4.351 27.327 .000 -9.505 1.872 -.325 -5.078 .000 (Constant) IGNO2 ACS Time 2 - 11. Ignore Model 1 B Std. Error Unstandardized Coefficients Beta Standardi zed Coefficien ts t Sig. Dependent Variable: GWB2NEGa. PD = 119 - 9.5*IP There is a sig. a or constant (Y-intercept) - this is the baseline level of Psychological Distress. In addition, Ignore Problems (IP) is a significant predictor of Psychological Distress (PD). Linear regression - Example
  • 32. 32 Ignore the Problem 543210 PsychologicalDistress 140 120 100 80 60 40 20 Rsq = 0.1058 a = 119 b = -9.5 PD = 119 - 9.5*IP e = error
  • 33. 33 Linear regression summary • Linear regression is for explaining or predicting the linear relationship between two variables • Y = bx + a + e • = bx + a (b is the slope; a is the Y-intercept)
  • 34. Multiple Linear Regression Linear relations between two or more IVs and a single DV
  • 35. 35 Linear Regression X Y Multiple Linear Regression X1 X2 X3 Y X4 X5 What is multiple linear regression (MLR)? Visual model Single predictor Multiple predictors
  • 36. 36 What is MLR? • Use of several IVs to predict a DV • Weights each predictor (IV) according to the strength of its linear relationship with the DV • Makes adjustments for inter- relationships among predictors • Provides a measure of overall fit (R)
  • 38. 38 What is MLR? A 3-way scatterplot can depict the correlational relationship between 3 variables. However, it is difficult to graph/visualise 4+- way relationships via scatterplot.
  • 39. 39 General steps 1. Develop a diagrammatic model and express a research question and/or hypotheses 2. Check assumptions 3. Choose type of MLR 4. Interpret output 5. Develop a regression equation (if needed)
  • 40. 40 • ~50% of the variance in CHD mortality could be explained by cigarette smoking (using LR) • Strong effect - but what about the other 50% (‘unexplained’ variance)? • What about other predictors? –e.g., exercise and cholesterol? LR → MLR example: Cigarettes & coronary heart disease
  • 41. 41 MLR – Example Research question 1 How well do these three IVs: • # of cigarettes / day (IV1) • exercise (IV2) and • cholesterol (IV3) predict • CHD mortality (DV)? Cigarettes Exercise CHD Mortality Cholesterol
  • 42. 42 MLR – Example Research question 2 To what extent do personality factors (IVs) predict annual income (DV)? Extraversion Neuroticism Income Psychoticism
  • 43. 43 MLR - Example Research question 3 “Does the # of years of formal study of psychology (IV1) and the no. of years of experience as a psychologist (IV2) predict clinical psychologists’ effectiveness in treating mental illness (DV)?” Study Experience Effectiveness
  • 44. 44 MLR - Example Your example Generate your own MLR research question (e.g., based on some of the following variables): • Gender & Age • Enrolment Type • Hours • Stress • Time management – Planning – Procrastination – Effective actions • Time perspective – Past-Negative – Past-Positive – Present-Hedonistic – Present-Fatalistic – Future-Positive – Future-Negative
  • 45. 45 Assumptions • Levels of measurement • Sample size • Normality (univariate, bivariate, and multivariate) • Linearity: Linear relations between IVs & DVs • Homoscedasticity • Multicollinearity – IVs are not overly correlated with one another (e.g., not over .7) • Residuals are normally distributed
  • 46. 46 Levels of measurement • DV = Continuous (Interval or Ratio) • IV = Continuous or Dichotomous (if neither, may need to recode into a dichotomous variable or create dummy variables)
  • 47. 47 Dummy coding • “Dummy coding” converts a more complex variable into a series of dichotomous variables (i.e., 0 or 1) • Several dummy variables can be created from a variable with a higher level of measurement.
  • 48. 48 Dummy coding - Example • Religion (1 = Christian; 2 = Muslim; 3 = Atheist) in this format, can't be an IV in regression (a linear correlation with a categorical variable doesn't make sense) • However, it can be dummy coded into dichotomous variables: – Christian (0 = no; 1 = yes) – Muslim (0 = no; 1 = yes) – Atheist (0 = no; 1 = yes) (redundant) • These variables can then be used as IVs. • More information (Dummy variable (statistics), Wikiversity)
  • 49. 49 Sample size: Rules of thumb • Enough data is needed to provide reliable estimates of the correlations. • N >= 50 cases and N >= 10 to 20 cases x no. of IVs, otherwise the estimates of the regression line are probably unstable and are unlikely to replicate if the study is repeated. • Green (1991) and Tabachnick & Fidell (2013) suggest: – 50 + 8(k) for testing an overall regression model and – 104 + k when testing individual predictors (where k is the number of IVs) – Based on detecting a medium effect size (β >= .20), with critical α <= .05, with power of 80%.
  • 50. 50 Sample size: Rules of thumb Q: Should a researcher conduct an MLR with 4 predictors with 200 cases? A: Yes; satisfies all rules of thumb: • N > 50 cases • N > 20 cases x 4 = 80 cases • N > 50 + 8 x 4 = 82 cases • N > 104 + 4 = 108 cases
  • 51. 51 Dealing with outliers Extreme cases should be deleted or modified if they are overly influential. • Univariate outliers - detect via initial data screening (e.g., min. and max.) • Bivariate outliers - detect via scatterplots • Multivariate outliers - unusual combination of predictors – detect via Mahalanobis' distance
  • 52. 52 Multivariate outliers • A case may be within normal range for each variable individually, but be a multivariate outlier based on an unusual combination of responses which unduly influences multivariate test results. • e.g., a person who: –Is 18 years old –Has 3 children –Has a post-graduate degree
  • 53. 53 Multivariate outliers • Identify & check unusual cases • Use Mahalanobis' distance or Cook’s D as a MVO screening procedure
  • 54. 54 Multivariate outliers • Mahalanobis' distance (MD) – Distributed as χ2 with df equal to the number of predictors (with critical α = .001) – Cases with a MD greater than the critical value are multivariate outliers. • Cook’s D – Cases with CD values > 1 are multivariate outliers. • Use either MD or CD • Examine cases with extreme MD or CD scores - if in doubt, remove & re-run.
  • 55. 55 Normality & homoscedasticity Normality • If variables are non-normal, this will create heteroscedasticity Homoscedasticity • Variance around the regression line should be the same throughout the distribution • Even spread in residual plots
  • 56. 56 Multicollinearity • Multicollinearity – IVs shouldn't be overly correlated (e.g., over .7) – if so, consider combining them into a single variable or removing one. • Singularity - perfect correlations among IVs. • Leads to unstable regression coefficients.
  • 57. 57 Multicollinearity Detect via:  Correlation matrix - are there large correlations among IVs?  Tolerance statistics - if < .3 then exclude that variable.  Variance Inflation Factor (VIF) – if > 3, then exclude that variable.  VIF is the reciprocal of Tolerance (so use one or the other – not both)
  • 58. 58 Causality • Like correlation, regression does not tell us about the causal relationship between variables. • In many analyses, the IVs and DVs could be swapped around – therefore, it is important to: –Take a theoretical position –Acknowledge alternative explanations
  • 59. 59 Multiple correlation coefficient (R) • “Big R” (capitalised) • Equivalent of r, but takes into account that there are multiple predictors (IVs) • Always positive, between 0 and 1 • Interpretation is similar to that for r (correlation coefficient)
  • 60. 60 Coefficient of determination (R2 ) • “Big R squared” • Squared multiple correlation coefficient • Always include R2 • Indicates the % of variance in DV explained by combined effects of the IVs • Analogous to r2
  • 61. 61 Rule of thumb for interpretation of R2 • .00 = no linear relationship • .10 = small (R ~ .3) • .25 = moderate (R ~ .5) • .50 = strong (R ~ .7) • 1.00 = perfect linear relationship R2 > .30 is “good” in social sciences
  • 62. 62 Adjusted R2 • R2 is explained variance in a sample. • Adjusted R2 is used for estimating explained variance in a population. • Report R2 and adjusted R2 . • Particularly for small N and where results are to be generalised, take more note of adjusted R2 .
  • 63. 63 Multiple linear regression – Test for overall significance • Shows if there is a significant linear relationship between the X variables taken together and Y • Examine F and p in the ANOVA table to determine the likelihood that the explained variance in Y could have occurred by chance
  • 64. 64 Regression coefficients • Y-intercept (a) • Slopes (b): –Unstandardised –Standardised • Slopes are the weighted loading of each IV on the DV, adjusted for the other IVs in the model.
  • 65. 65 Unstandardised regression coefficients • B = unstandardised regression coefficient • Used for regression equations • Used for predicting Y scores • But can’t be compared with other Bs unless all IVs are measured on the same scale
  • 66. 66 Standardised regression coefficients • Beta (β) = standardised regression coefficient • Useful for comparing the relative strength of predictors • β = r in LR but this is only true in MLR when the IVs are uncorrelated.
  • 67. 67 Test for significance: Independent variables Indicates the likelihood of a linear relationship between each IV (Xi) and Y occurring by chance. Hypotheses: H0: βi = 0 (No linear relationship) H1: βi ≠ 0 (Linear relationship between Xi and Y)
  • 68. 68 Relative importance of IVs • Which IVs are the most important? • To answer this, compare the standardised regression coefficients (βs)
  • 69. 69 Y = b1x1 + b2x2 +.....+ bixi + a + e • Y = observed DV scores • bi = unstandardised regression coefficients (the Bs in SPSS) - slopes • x1 to xi = IV scores • a = Y axis intercept • e = error (residual) Regression equation
  • 70. 70 Multiple linear regression - Example “Does ‘ignoring problems’ (IV1) and ‘worrying’ (IV2) predict ‘psychological distress’ (DV)”
  • 71. 71
  • 73. 73 Multiple linear regression - Example Together, Ignoring Problems and Worrying explain 30% of the variance in Psychological Distress in the Australian adolescent population (R2 = .30, Adjusted R2 = .29).
  • 74. 74 Multiple linear regression - Example The explained variance in the population is unlikely to be 0 (p = .00).
  • 75. 75 Coefficientsa 138.932 4.680 29.687 .000 -11.511 1.510 -.464 -7.625 .000 -4.735 1.780 -.162 -2.660 .008 (Constant) Worry Ignore the Problem Model 1 B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: Psychological Distressa. Multiple linear regression - Example Worry predicts about three times as much variance in Psychological Distress than Ignoring the Problem, although both are significant, negative predictors of mental health.
  • 76. 76 Linear Regression PD (hat) = 119 – 9.50*Ignore R2 = .11 Multiple Linear Regression PD (hat) = 139 - .4.7*Ignore - 11.5*Worry R2 = .30 Multiple linear regression - Example – Prediction equations
  • 77. 77 Confidence interval for the slope Mental Health (PD) is reduced by between 8.5 and 14.5 units per increase of Worry units. Mental Health (PD) is reduced by between 1.2 and 8.2 units per increase in Ignore the Problem units.
  • 78. 78 Multiple linear regression - Example Effect of violence, stress, social support on internalising behaviour problems Kliewer, Lepore, Oskin, & Johnson, (1998) Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php
  • 79. 79 Multiple linear regression – Example – Violence study • Participants were children: – 8 - 12 years – Lived in high-violence areas, USA • Hypotheses: – Stress → ↑ internalising behaviour – Violence → ↑ internalising behaviour – Social support → ↓ internalising behaviour
  • 80. 80 Multiple linear regression – Example - Variables • Predictors –Degree of witnessing violence –Measure of life stress –Measure of social support • Outcome –Internalising behaviour (e.g., depression, anxiety, withdrawal symptoms) – measured using the Child Behavior Checklist (CBCL)
  • 81. 81 Correlations Correlations Pearson Correlation .050 .080 -.080 .200* .270** -.170 Amount violenced witnessed Current stress Social support Internalizing symptoms on CBCL Amount violenced witnessed Current stress Social support Internalizin g symptoms on CBCL Correlation is significant at the 0.05 level (2-tailed).*. Correlation is significant at the 0.01 level (2-tailed).**. Correlations amongst the IVs Correlations between the IVs and the DV
  • 82. 82 Model Summary .37a .135 .108 2.2198 R R Square Adjusted R Square Std. Error of the Estimate Predictors: (Constant), Social support, Current stress, Amount violenced witnessed a. R2 13.5% of the variance in children's internalising symptoms can be explained by the 3 predictors.
  • 83. 83 Regression coefficients Coefficientsa .477 1.289 .37 .712 .038 .018 .201 2.1 .039 .273 .106 .247 2.6 .012 -.074 .043 -.166 -2 .087 (Constant) Amount violenced witnessed Current stress Social support B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: Internalizing symptoms on CBCa. 2 predictors have p < .05
  • 84. 84 Regression equation • A separate coefficient or slope for each variable • An intercept (here its called b0 ) 477.0074.0273.0038.0 ˆ 0332211 +−+= +++= SocSuppStressWit bXbXbXbY
  • 85. 85 Interpretation • Slopes for Witness and Stress are +ve; slope for Social Support is -ve. • Ignoring Stress and Social Support, a one unit increase in Witness would produce .038 unit increase in Internalising symptoms. 477.0074.0273.0038.0 ˆ 0332211 +−+= +++= SocSuppStressWit bXbXbXbY
  • 86. 86 Predictions Q: If Witness = 20, Stress = 5, and SocSupp = 35, what we would predict internalising symptoms to be? A: .012 012. 477.0)35(074.)5(273.)20(038. 477.0*074.*273.*038.ˆ = +−+= +−+= SocSuppStressWitY
  • 87. 87 Multiple linear regression - Example The role of human, social, built, and natural capital in explaining life satisfaction at the country level: Towards a National Well-Being Index (NWI) Vemuri & Costanza (2006)
  • 88. 88 Variables • IVs: –Human & Built Capital (Human Development Index) –Natural Capital (Ecosystem services per km2 ) –Social Capital (Press Freedom) • DV = Life satisfaction • Units of analysis: Countries (N = 57; mostly developed countries, e.g., in Europe and America)
  • 89. 89 ● There are moderately strong positive and statistically significant linear relations between the IVs and the DV ● The IVs have small to moderate positive inter-correlations.
  • 90. 90 ● R2 = .35 ● Two sig. IVs (not Social Capital - dropped)
  • 92. 92 ● R2 = .72 (after dropping 6 outliers)
  • 93. 93 Types of MLR • Standard or direct (simultaneous) • Hierarchical or sequential • Stepwise (forward & backward) Image source: https://commons.wikimedia.org/wiki/File:IStumbler.png
  • 94. 94 • All predictor variables are entered together (simultaneously) • Allows assessment of the relationship between all predictor variables and the outcome (Y) variable if there is good theoretical reason for doing so. • Manual technique & commonly used. • If you're not sure what type of MLR to use, start with this approach. Direct or Standard
  • 95. 95 • IVs are entered in blocks or stages. –Researcher defines order of entry for the variables, based on theory. –May enter ‘nuisance’ variables first to ‘control’ for them, then test ‘purer’ effect of next block of important variables. • R2 change - additional variance in Y explained at each stage of the regression. – F test of R2 change. Hierarchical (Sequential)
  • 96. 96 • Example – Drug A is a cheap, well-proven drug which reduces AIDS symptoms – Drug B is an expensive, experimental drug which could help to cure AIDS – Hierarchical linear regression: • Step 1: Drug A (IV1) • Step 2: Drug B (IV2) • DV = AIDS symptoms • Research question: To what extent does Drug B reduce AIDS symptoms above and beyond the effect of Drug A? • Examine the change in R2 between Step 1 & Step 2 Hierarchical (Sequential)
  • 97. 97 • Computer-driven – controversial. • Starts with 0 predictors, then the strongest predictor is entered into the model, then the next strongest etc. if they reach a criteria (e.g., p < .05) Forward selection
  • 98. 98 • Computer-driven – controversial. • All predictor variables are entered, then the weakest predictors are removed, one by one, if they meet a criteria (e.g., p > .05) Backward elimination
  • 99. 99 • Computer-driven – controversial. • Combines forward & backward. • At each step, variables may be entered or removed if they meet certain criteria. • Useful for developing the best prediction equation from a large number of variables. • Redundant predictors are removed. Stepwise
  • 100. 100 Which method? • Standard: To assess impact of all IVs simultaneously • Hierarchical: To test IVs in a specific order (based on hypotheses derived from theory) • Stepwise: If the goal is accurate statistical prediction from a large # of variables - computer driven
  • 102. 102 Summary: General steps 1. Develop model and hypotheses 2. Check assumptions 3. Choose type 4. Interpret output 5. Develop a regression equation (if needed)
  • 103. 103 Summary: Linear regression 1. Best-fitting straight line for a scatterplot of two variables 2. Y = bX + a + e 1. Predictor (X; IV) 2. Outcome (Y; DV) 3. Least squares criterion 4. Residuals are the vertical distance between actual and predicted values
  • 104. 104 Summary: MLR assumptions 1. Level of measurement 2. Sample size 3. Normality 4. Linearity 5. Homoscedasticity 6. Collinearity 7. Multivariate outliers 8. Residuals should be normally distributed
  • 105. 105 Summary: Level of measurement and dummy coding 1. Levels of measurement 1. DV = Interval or ratio 2. IV = Interval or ratio or dichotomous 2. Dummy coding 1. Convert complex variables into series of dichotomous IVs
  • 106. 106 Summary: MLR types 1. Standard 2. Hierarchical 3. Stepwise / Forward / Backward
  • 107. 107 Summary: MLR output 1. Overall fit 1. R, R2 , Adjusted R2 2. F, p 2. Coefficients 1. Relation between each IV and the DV, adjusted for the other IVs 2. B, β, t, p, and rp 3. Regression equation (if useful) Y = b1x1 + b2x2 +.....+ bixi + a + e
  • 109. 109 MLR I Quiz – Practice question 1 A linear regression analysis produces the equation Y = 0.4X + 3. This indicates that: (a) When Y = 0.4, X = 3 (b) When Y = 0, X = 3 (c) When X = 3, Y = 0.4 (d) When X = 0, Y = 3 (e) None of the above
  • 110. 110 MLR I Quiz – Practice question 1 Multiple linear regression is a ________ type of statistical analysis. (a) univariate (b) bivariate (c) multivariate
  • 111. 111 MLR I Quiz – Practice question 3 Multiple linear regression is a ________ type of statistical analysis. (a) univariate (b) bivariate (c) multivariate
  • 112. 112 MLR I Quiz – Practice question 4 The following types of data can be used in MLR (choose all that apply): (a) Interval or higher DV (b) Interval or higher IVs (c) Dichotomous Ivs (d) All of the above (e) None of the above
  • 113. 113 MLR I Quiz – Practice question 5 In MLR, the square of the multiple correlation coefficient, R2 , is called the: (a) Coefficient of determination (b) Variance (c) Covariance (d) Cross-product (e) Big R
  • 114. 114 MLR I Quiz – Practice question 6 In MLR, a residual is the difference between the predicted Y and actual Y values. (a) True (b) False
  • 115. 115 Next lecture • Review of MLR I • Semi-partial correlations • Residual analysis • Interactions • Analysis of change
  • 116. 116 References Howell, D. C. (2004). Chapter 9: Regression. In D. C. Howell.. Fundamental statistics for the behavioral sciences (5th ed.) (pp. 203- 235). Belmont, CA: Wadsworth. Howitt, D. & Cramer, D. (2011). Introduction to statistics in psychology (5th ed.). Harlow, UK: Pearson. Kliewer, W., Lepore, S.J., Oskin, D., & Johnson, P.D. (1998). The role of social and cognitive processes in children’s adjustment to community violence. Journal of Consulting and Clinical Psychology, 66, 199-209. Landwehr, J.M. & Watkins, A.E. (1987) Exploring data: Teacher’s edition. Palo Alto, CA: Dale Seymour Publications. Tabachnick, B. G., & Fidell, L. S. (2013) (6th ed. - International ed.). Multiple regression [includes example write-ups]. In Using multivariate statistics (pp. 117-170). Boston, MA: Allyn and Bacon. Vemuri, A. W., & Constanza, R. (2006). The role of human, social, built, and natural capital in explaining life satisfaction at the country level: Toward a National Well-Being Index (NWI). Ecological Economics, 58(1), 119-133.
  • 117. 117 Open Office Impress ● This presentation was made using Open Office Impress. ● Free and open source software. ● http://www.openoffice.org/product/impress.html

Notas do Editor

  1. 7126/6667 Survey Research &amp; Design in Psychology Semester 1, 2017, University of Canberra, ACT, Australia James T. Neill Home page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology Lecture page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Multiple_linear_regression_I Image source:http://commons.wikimedia.org/wiki/File:Vidrarias_de_Laboratorio.jpg License: Public domain Description: Introduces and explains the use of linear regression and multiple linear regression in the context of psychology. This lecture is accompanied by a computer-lab based tutorial, the notes for which are available here: http://ucspace.canberra.edu.au/display/7126/Tutorial+-+Multiple+linear+regression
  2. Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg License: Public domain Assumed knowledge: A solid understanding of linear correlation.
  3. Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg License: Public domain Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  4. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ Regression tends not to be used for Exploratory or Descriptive purposes.
  5. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  6. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  7. Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg License: Public domain Image name: James Neill License: Public domain
  8. Image source: Unknown To describe the regression line, need the slope of the line of best fit and the point at which it touches the Y-axis. A line of best fit can be applied using any method e.g., by eye/hand. Another way is to use the Method of Least Squares – a formula which minimizes the sum of the vertical deviations See also - http://www.hrma-agrh.gc.ca/hr-rh/psds-dfps/dafps_basic_stat2_e.asp#D
  9. Image sources: Unknown Example from Landwehr &amp; Watkins (1987), cited in Howell (2004, pp. 216-218) and accompanying powerpoint lecture notes).
  10. Image source: Unknown
  11. Image source: Howell (2004, pp. 216-218)‏
  12. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  13. Image source: Public domain http://commons.wikimedia.org/wiki/File:Scatterplot_WithLRForumulaIndicated.png
  14. Y = a + bx + e X = predictor value (IV) Y = predicted value (DV) a = Y axis intercept(Y-intercept – i.e., when X is 0) b = unstandardised regression coefficient (i.e. B in SPSS)(regression coefficient - slope – line of best fit – average rate at which Y changes with one unit change in X) e = error
  15. LaTeX: \hat Y=bx+a
  16. R2 needs to be provided first Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  17. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ The intercept is labeled “constant.” Slope is labeled by name of predictor variable.
  18. The variance of these residuals is indicated by the standard error in the regression coefficients table
  19. Image source: Unknown
  20.  (rho - population correlation) = 0
  21. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ Significance tests of the slope and intercept are given as t-tests. The t values in the second from right column are tests on slope and intercept. The associated p values are next to them. The slope is significantly different from zero, but not the intercept.
  22. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ Note that high scores indicate good mental health, i.e., absence of distress
  23. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ R = correlation [multiple correlation in MLR] R2 = % of variance explained Adjusted R2 = % of variance, reduced estimate, bigger adjustments for small samples In this case, Ignoring Problems accounts for ~10% of the variation in Psychological Distress
  24. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ The MLR ANOVA table provides a significance test of R It is NOT a “normal ANOVA” (test of mean differences) tests whether a significant (non-zero) amount of variance is explained? (null hypothesis is zero variance explained) In this case a significant amount of Psychological Distress variance is explained by Ignoring Problems, F(1,218) = 25.78, p &amp;lt; .01
  25. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ Multiple regression coefficient table Analyses the relationship of each IV with the DV For each IV, examine B, Beta, t and sig. B = unstandardised regression coefficient[use in prediction equations] Beta (b) = standardised regression coefficient[use to compare predictors with one another] t-test &amp; sig. shows the statistical likelihood of a DV-IV relationship being caused by chance alone
  26. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  27. Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg License: Public domain Image soruce: http://commons.wikimedia.org/wiki/File:Example_path_diagram_%28conceptual%29.png License: Public domain Use of several IVs to predict a DV
  28. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ In MLR there are: multiple predictor X variables (IVs) and a single predicted Y (DV)
  29. Interrelationships between predictors e.g. IVs = height, gender → DV = weight
  30. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  31. Image source: Figure 11.2 Three-dimensional plot of teaching evaluation data (Howell, 2004, p. 248)
  32. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  33. IVs = metric (interval or ratio) or dichotomous, e.g. age and gender DV = metric (interval or ratio), e.g., pulse Linear relations exist between IVs &amp; DVs, e.g., check scatterplots IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc. Homoscedasticity – similar/even spread of data from line of best throughout the distribution For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
  34. IVs = metric (interval or ratio) or dichotomous, e.g. age and gender DV = metric (interval or ratio), e.g., pulse Linear relations exist between IVs &amp; DVs, e.g., check scatterplots IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc. Homoscedasticity – similar/even spread of data from line of best throughout the distribution For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
  35. IVs = metric (interval or ratio) or dichotomous, e.g. age and gender DV = metric (interval or ratio), e.g., pulse Linear relations exist between IVs &amp; DVs, e.g., check scatterplots IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc. Homoscedasticity – similar/even spread of data from line of best throughout the distribution For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
  36. Image source: Unknown
  37. Regression can establish correlational link, but cannot determine causation.
  38. The coefficient of determination is a measure of how well the regression line represents the data.  If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation and R2 would be 1. The further the line is away from the points, the less it is able to explain. If the scatterplot is completely random and there is zero relationship between the IVs and the DV, then R2 will be 0.
  39. As number of predictors approaches N, R2 is inflated
  40.  = r in LR but this is only true in MLR when the IVs are uncorrelated.
  41. If IVs are uncorrelated (usually not the case) then you can simply use the correlations between the IVs and the DV to determine the strength of the predictors. If the IVs are standardised (usually not the case), then the unstandardised regression coefficients (B) can be compared to determine the strength of the predictors. If the IVs are measured using the same scale (sometimes the case), then the unstandardised regression coefficients (B) can meaningfully be compared.
  42. The MLR equation has multiple regression coefficients and a constant (intercept).
  43. Image source: http://www.imaja.com/as/poetry/gj/Worry.html
  44. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  45. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ It is a good idea to get into the habit of drawing Venn diagrams to represent the degree of linear relationship between variables.
  46. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  47. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  48. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  49. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
  50. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ 95% CI
  51. Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php Kliewer, Lepore, Oskin, &amp; Johnson, (1998)
  52. Data available at www.duxbury.com/dhowell/StatPages/More_Stuff/Kliewer.dat
  53. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ CBCL = Child Behavior Checklist Predictors are largely independent of each other. Stress and Witnessing Violence are significantly correlated with Internalizing.
  54. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ R2 has same interpretation as r2. 13.5% of variability in Internal accounted for by variability in Witness, Stress, and SocSupp.
  55. Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/ t test on two slopes (Violence and Stress) are positive and significant. SocSupp is negative and not significant. However the size of the effect is not much different from the two significant effects.
  56. Image source: Unknown
  57. Image source: Unknown Re the 2nd point - the same holds true for other predictors.
  58. Image source: Unknown
  59. Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php Kliewer, Lepore, Oskin, &amp; Johnson, (1998)
  60. Image source::Vemuri &amp; Constanza (2006).
  61. Image source::Vemuri &amp; Constanza (2006).
  62. Image source::Vemuri &amp; Constanza (2006). Nigeria, India, Bangladesh, Ghana, China and Philippines were treated as outliers and excluded from the analysis.
  63. Image source::Vemuri &amp; Constanza (2006).
  64. Image source: https://commons.wikimedia.org/wiki/File:IStumbler.png Image author: http://www.istumbler.net/
  65. Manual technique &amp; commonly used. e.g., some treatment variables may be less expensive and these could be entered first to find out whether or not there is additional justification for the more expensive treatments
  66. Manual technique &amp; commonly used. e.g., some treatment variables may be less expensive and these could be entered first to find out whether or not there is additional justification for the more expensive treatments
  67. These residual slides are based on Francis (2007) – MLR (Section 5.1.4) – Practical Issues &amp; Assumptions, pp. 126-127 and Allen and Bennett (2008) Note that according to Francis, residual analysis can test: Additivity (i.e., no interactions b/w Ivs) (but this has been left out for the sake of simplicity)
  68. Answer: When X = 0, Y = 3
  69. Answer: Multivariate
  70. Answer: Multivariate
  71. Answer: (d)
  72. Answer: Coefficient of determination
  73. Answer: True