This document discusses regression analysis and linear regression. It explains that regression analysis involves fitting a straight line to bivariate data using the least squares regression line. It also discusses interpreting the slope and y-intercept of the regression line, residual plots, and using the regression line to extrapolate or interpolate values.
2. Regression Analysis
The process of fitting a straight line to bivariate
data is known as linear regression
This can be done either by:
Calculating and plotting the least squares
regression line
Drawing the three median line
A regression analysis involves a range of
statistics to summarise the relationship between
two numerical variables
K McMullen 2012
3. Regression Analysis
Included in a regression analysis:
Scatterplot: form, direction, outliers
Correlation coefficient: measures strength
Regression line: models the relationship
Interpreting slope and y-intercept
Coefficient of determination: measures the
predictive power of the relationship
Residual plot: tests linearity
Final report
K McMullen 2012
4. Regression Analysis
We have already covered:
Scatterplot
Correlation coefficient (r)
Coefficient of determination (r2)
What we will cover here:
Regression line
Interpreting slope and y-intercept
Residual plots
K McMullen 2012
5. Regression Analysis
Least Squares Regression Line (also referred to as
Regression Line)
The least squares line is like the mean- it is a line that
best fits the data
The line balances out the data values on either side of
the line
The distance between the data value and the
regression line is known as the residual
The least squares line is the line where the sum of the
squares of the residuals is the least (we take the
squares so that the negative residuals don’t cancel out
the positive residuals- remember that squaring a
negative makes a positive)
K McMullen 2012
7. Regression Analysis
Interpreting slope: remember that the slope
indicates what happens to the DV as the IV
increases (as x increases y either increases
(positive gradient) or decreases (negative
gradient)
Comment: “On average, the DV will
increase/decrease by slope for every 1 increase
in IV”
Eg. On average, life expectancies (DV) in countries
will decrease by 1.44years for an increase in birth
rate (x) of one birth per 1000 people.
K McMullen 2012
8. Regression Analysis
Interpreting intercept: the regression line gives us
the value of the y-intercept and what the DV is
when the IV is 0.
Comment: “On average, when IV is 0 the DV is
intercept”
Eg. On average, the life expectancy for countries with
a zero birth rate is 105.4 years
K McMullen 2012
9. Regression Analysis
The residual plot: shows how far the data values are from the
regression line
The regression line basically becomes the line x=0 and the data
values remain the same distance away from this line (basically
think of moving the regression line and all the data values so
that the line is now horizontal- the data values remain the same
distance away from the line)
Remember:
Residual value= data value- predicted value
If a residual plot is randomly scattered and the residual values
are close to 0 then we can confirm our assumption of a linear
relationship
Comment: “The assumption that there is a linear relationship
between IV and DV is confirmed by the residual plot”
K McMullen 2012
10. Regression Analysis
Extrapolation and interpolation: we use the
regression line to predict values where:
Extrapolation: is where we predict outside the
range of data
Interpolation is where we predict inside the range
of data
K McMullen 2012