2. Objectives:
4.1 Set up and interpret simple linear regression
equations.
4.2 Estimate intercept and slope parameters of a
regression line using the method of least-squares.
4.3 Determine whether estimated parameters are
statistically significant using
either t-tests or p-values associated with parameter
estimates.
3. Objectives:
4.4 Evaluate how well a regression equation “fits” the data by
examining the R2 statistic and test for statistical significance of
the whole regression equation using an F-test.
4.5 Set up and interpret multiple regression models that use
more than one explanatory variable.
4.6 Use linear regression techniques to estimate the
parameters of two common nonlinear models: quadratic and
log-linear regression models.
4. To implement the various techniques discussed in this text, managers must be able to determine the mathematical
relation between the economic variables that make up the various functions used in managerial economics.
Ex: Managers always determine the total cost of producing various levels of output.
C=a + bQ + cQ^2 + dQ^3
Where: C= cost
Q= quantity
a,b,c,d= parameters
Parameters- the coefficient in an equation that determines the exact mathematical relation among variables.
Parameters:
a= 1,262 b=1.0 c=-0.03 d=0.005
Parameters Estimation- the process of finding estimates of the numerical values of the parameters of an equation.
Regression Analysis- a statistical technique for estimating the parameters of an equation and testing for statistical
significance.
5. Regression analysis is a technique used
to determine the mathematical relation
between a dependent variable and one
or more explanatory variable.
Y= a + bX
Where:
Y= dependent variable
X=explanatory variable
X=explanatory variable
b=slope parameter
4.1 THE SIMPLE
LINEAR REGRESSION
MODEL
To illustrate the simple regression model, consider a
statistical problem facing the Tampa Bay Travel
Agents’ Association.
The association wishes to determine the
mathematical relation between the dollar volume of
sales of travel packages and the level of
expenditures on newspaper advertising for travel
agents located in the Tampa-St. Petersburge
metropolitan area. Suppose that the true (or actual)
relation between sales and advertising expenditures
is
S= 10,000 + 5A
Where:
S= monthly sales in dollars
A= level of expenditures on newspaper
advertising
A Hypothetical Regression
True (or actual) relation- the true or actual
underlying relation between Y and X that is unknown
to the researcher but is to be discovered by
analyzing the sample data.
6. The Random Error Term
An unobserved term added to a regression model to capture the effects of all minor, unpredicted factors
that affects Y but cannot reasonably be include as explanatory variables.
S= 10,000 + 5A + e
Where:
S= monthly sales in dollars
A= level of expenditures on newspaper advertising
e= random effect
Tampa Firms
Advertising
expenditure
Actual sales Expected sales Random effect
Travel Agency $3,000 $30,000 $25,000 $5,000
Buccaneer Travel
Services
3,000 21,000 25,000 -4,000
Happy Getaway
Tours
3,000 25,000 25,000 0
7. Regression analysis provides a way of
finding the line that “best fit” the scatter
of points.
The purpose of regression analysis is
two fold.
1.To estimate the parameters (a and b)
of true regression line
2.To test whether the estimated values
of the parameters are statistically
significant.
4.2 FITTING A
REGRESSION LINE
Time-series- a data set in which the data for the
independent and explanatory variables are collected over
time for a specific firm.
Cross-sectional- a data set in which the data on the
dependent and explanatory variables are collected from
many different firms or industries at a given point in time.
Scatter diagram- a graph of the data points in a sample.
Population regression line- The equation or line
representing the true (or actual) underlying relation
between the dependent variable and the explanatory
variable(s).
Sample Regression Line
The line that best fits the scatter of data points in the
sample and provides an estimate of the population
regression line.
8. Statistically significant
There is sufficient evidence from the
sample to indicate that the true value of
the coefficient is not 0.
Hypothesis testing
A statistical technique for making a
probabilistic statement about the true
value of a parameter.
4.3 TESTING FOR
STATISTICAL
SIGNIFICANCE
Unbiased estimator
An estimator that produces estimates of a parameter
that are on average equal to the true value of the
parameter.
The concept of a t-ratio
Statisticians use a t-test to make a probabilistic
statement about the likelihood that the true
parameter value b us not equal to zero.
T-test
A statistical test used to test the hypothesis that the
true value of a parameter is equal to 0 (b = 0).
Performing a t-Test for Statistical Significance
9. Critical value of t
The value that the t-statistic must exceed in order to reject the hypothesis that b = 0.
Type I error
Error in which a parameter estimate is found to be statistically significant when it is not.
Level of significance
The probability of finding the parameter to be statistically significant when in fact it is not.
Level of confidence
The probability of correctly failing to reject the true hypothesis that b = 0; equals one minus the level of
significance.
Using p-Values to Determine Statistical Significance
p-value- The exact level of significance for a test statistic, which is the probability of finding significance
when none exists.
10. 4.4 EVALUATION OF
THE REGRESSION
EQUATION
Involves determining
how well the estimated
regression equation
“explains” the variation in
Y.
The two statistics to
evaluate the overall
acceptability of a
regression equation
1. Coefficient of
determination (R²)
2. F-statistic
11. The Coefficient of Determination (R2)
The fraction of total variation in the dependent
variable explained by the regression equation.
R2 ranges in value from 0 ( the regression
explains none of the variation in Y) to 1 (the
regression explains all the variation in Y).
12. A high R2 indicates Y and X are highly correlated and the scatter
diagram tightly fits the sample regression line; and if it low, there is
low correlation.
Example: figure 4.4 high and low correlation
Y y
0 x 0 x
Panel A Panel B
13. The F-Statistics
A statistic used to test whether the overall regression equation is
statistically significant.
In very general terms, this statistic provides a measure of the ratio of
explained variation (in the dependent variable) to unexplained
variation.
The test involves comparing the F-statistic to the critical F-value with
k-1 and n-k degrees of freedom and the chosen level of significance. If
the F-statistic exceeds the critical F-value, the regression equation is
statistically significant.
14. 4.5 MULTIPLE
REGRESSION
Regression models that use
more than one explanatory
variable to explain the
variation in the dependent
variable.
A typical multiple
regression equation
might take the form
• Y = a + bX + cW + dZ
• Y is the dependent
variable
• a is the intercept
parameter
• X, W, and Z are the
explanatory variables
• b, c, and d are the slope
parameters for each of
these explanatory
variables.
15. As in simple regression, the slope parameters b, c, and d measure the
change in Y associated with a one-unit change in one of the explanatory variables,
holding the rest of the explanatory variables constant.
Estimation of the parameters of a multiple regression equation is
accomplished by finding a linear equation that best fits the data. As in simple
regression, a computer is used to obtain the parameter estimates, their individual
standard errors, the F-statistic, the R2 and the p-values. The statistical
significance of the individual parameters and of the equation as a whole can be
determined by t-tests and an F-test, respectively. The R2 is interpreted as the
fraction of the variation in Y explained by the entire set of explanatory variables
taken together.
16. 4.6 NONLINEAR
REGRESSION
ANALYSIS
Nonlinear regression models are
used when the underlying relation
between Y and X plots as a curve,
rather than a straight line. An
analyst generally chooses a
nonlinear regression model when
the scatter diagram shows a
curvilinear pattern.
Two extremely useful forms of
nonlinear models that you will
encounter :
(1) Quadratic regression
models.
(2) Loglinear regression
models.
17. Quadratic Regression Models
The theoretical relations between economic variables will graph
as either a -U-shaped or an inverted-U-shaped curve, depending on the signs of b and c. If b is
negative and c is positive, the quadratic function is U-shaped. If b is positive and c is negative, the
quadratic function is U-shaped. Thus, a U-shaped quadratic equation (b < 0 and c > 0) is
appropriate when as X increases, Y first falls, eventually reaches a minimum, and then rises
thereafter. Alternatively, an inverted-U-shaped quadratic equation (b > 0 and c < 0) is appropriate if
as X increases, Y first rises, eventually reaches a peak, and then falls thereafter.
18. Log-Linear Regression Models
Y=aXbZc
This nonlinear functional form, which we employ in Chapter 7 to estimate demand functions and in
the appendix to Chapter 10 to estimate production functions, is particularly useful because the
parameters b and c are elasticities
Percentage change in Y Percentage change in Y
b= c=
Percentage change in X Percentage change in Z
19. To estimate the parameters of this nonlinear equation, it must be transformed into a
linear form. This is accomplished by taking natural logarithms of both sides of the
equation. Taking the logarithm of the
function Y=aXbZc results in Y=(In a) + c(In Z)
Y’ = In Y
X’= In X
Z’= In Z
a’= In a
The regression equation is linear
Y’= a’ + bX’ + cZ’