3. Interaction Terms
Sometime X1 Impacts Y and X2 Impacts Y but when both X1 and
X2 are Present there is an additional impact (+ or - ) beyond
Y = B0 + (B1 * (X1)) + (B2 * (X2)) + (B3 * (X3)(X2) + ε
Income = B0 + B1 *Gender + B2 * Education + B3* Gender * Education + ε
Our Beta Three Term Gives Us the Effect of Gender and Education
Together
Assuming Gender is Binary in the Model - The Interaction Will
Explore the Differential Effect on Income By Gender
4. Image From - Thomas Brambor, William Roberts Clark & Matt Golder, Understanding Interaction Models:
Improving Empirical Analyses, 14 Political Analysis 63 (2005)
A Visual Display of
Interaction Terms
6. Limited Dependent
Variables (LPM and Logit)
Sometimes the Dependent Variable of Interest is Limited
Start with the Simplest Case - Binary (0,1)
Lots of Good Examples -
Voting (Republican / Democrat)
Trial (Guilty / Not Guilty)
Vote By Judges/Justices (i.e. Affirm / Reverse)
Hiring (Hired / Not Hired)
Admissions (Admitted / Not Admitted)
etc.
7. Limited Dependent
Variables (LPM and Logit)
Linear Probability Model (a form of Binomial Regression)
Observed variable for each observation takes values which
are either 0 or 1.
The probability of observing a 0 or 1 in any one case is
treated as depending on one or more explanatory variables.
For the linear probability model, this relationship is a
particularly simple one, and allows the model to be fitted by
simple linear regression.
8. The LPM shows that the fitted values of the equation represent the
probability that Yi=1 for the given values Xi.
The error terms, however, are not normally distributed. Because there
are only two possible outcomes, the error terms are binomially
distributed, because Y can only be 0 and 1.
Problem with LPM:
1) Heteroscedasticity
2) The difficulty of interpreting probabilities >1 and < 0
If Yi=0, then 0 =α + β1X1i + β2X2i + εi
If Yi=1, then 1 =α + β1X1i + β2X2i + εi
Linear Probability Model
9. Linear Probability Model
AN EXAMPLE: Spector and Mazzeo examined the effect of a teaching method
known as PSI on the performance of students in a course, intermediate macro
economics. The question was whether students exposed to the method scored
higher on exams in the class. They collected data from students in two classes,
one in which PSI was used and another in which a traditional teaching method
was employed. For each of 32 students, they gathered the following data:
GRADE — coded 1 if the final grade was an A, 0 if the final grade was a B or C. 11 sample
members (34.38%) got As and are coded 1.
GPA — Grade point average before taking the class. Observed values range from a low of 2.06
to a high of 4.0 with mean 3.12.
TUCE — the score on an exam given at the beginning of the term to test entering knowledge of
the material. In the sample, TUCE ranges from a low of 12 to a high of 29 with a mean of 21.94.
PSI — a dummy variable indicating the teaching method used (1 = used Psi, 0 = other method).
14 of the 32 sample members (43.75%) are in PSI.
Example From http://www.nd.edu/~rwilliam/stats2/
11. Linear Probability Model
-1-.50.51
Residuals
0 .2 .4 .6 .8 1
Fitted values
HETEROSKEDASTICITY - The Fitted v.
Residuals should look like a random
scatterplot
ERRORS ARE NOT NORMALLY
DISTRIBUTED - OLS assumes that, for
each set of values for the k independent
variables, the residuals are normally
distributed. This is equivalent to saying
that, for any given value of yhat, the
residuals should be normally distributed.
This assumption is also clearly violated
12. Linear Probability Model
LINEARITY - the predicted values also suggest that there may be problems with the plausibility of
the model and/or its coefficient estimates.
Probabilities can only range between 0 and 1. However, in OLS, there is no constraint that the
yhat estimates fall in the 0-1 range; indeed, yhat is free to vary between negative infinity and
positive infinity.
Check Out This Output to Help Make the Point Clear:
-.50.51
2 2.5 3 3.5 4
gpa
Fitted values grade
13. Good News is We Have a
Potential Solution ...
Logistic Regression
15. Logistic Regression
Logistic Regression is used for predicting
the outcome of a binary dependent
variable (a variable which can take only
two possible outcomes, e.g. "yes" vs.
"no" or "success" vs. "failure") based on
one or more predictor variables.
Logistic regression attempts to model the
probability of a "yes/success" outcome
using a linear function of the predictors.
16. Logistic regression is an approach to prediction, like Ordinary
Least Squares (OLS) regression. However, with logistic
regression, the researcher is predicting a dichotomous outcome.
This situation poses problems for the assumptions of OLS that the
error variances (residuals) are normally distributed.
In logistic regression, a complex formula is required to convert
back and forth from the logistic equation to the OLS-type
equation. The logistic formulas are stated in terms of the
probability that Y = 1, which is referred to as p hat
The probability that Y is 0 is 1 -
http://www.upa.pdx.edu/IOA/newsom/da2/
17. Logistic Regression
The ln symbol refers to a natural logarithm and B0 + B1X is our
familiar equation for the regression line.
can be computed from the regression equation also. So, if we
know the regression equation, we could, theoretically, calculate
the expected probability that Y = 1 for a given value of X
exp is the exponent function, sometimes written as e. So, the
equation on the right is just the same thing but replacing exp with e
NOTE: e here is not the residual.
http://www.upa.pdx.edu/IOA/newsom/da2/
18. Logistic Regression
Because of these complicated algebraic translations, our regression
coefficients are not as easy to interpret.
Our old maxim that b represents “the change in Y with one unit
change in X” is no longer applicable.
Instead, we have to translate using the exponent function.
And, as it turns out, when we do that we have a type of
“coefficient” that is pretty useful.
This coefficient is called the
http://www.upa.pdx.edu/IOA/newsom/da2/
ODDS RATIO.
20. http://www.ats.ucla.edu/stat/stata/dae/logit.htm
Logistic
Regression
In R
For every one unit change in gre,
the log odds of admission (versus
non-admission) increases by 0.002.
For a one unit increase in gpa, the
log odds of being admitted to
graduate school increases by
0.804.
The indicator variables for rank have a slightly different interpretation. For example,
having attended an undergraduate institution with rank of 2, versus an institution with
a rank of 1, decreases the log odds of admission by 0.675.
21. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm
What is an Odds Ratio?
If Probability of success of some event is .8
Then the probability of failure is 1- .8 = .2
The odds of success are defined as the ratio of the probability
of success over the probability of failure
Thus, the odds of success are .8/.2 = 4
In other words, the odds of success are 4 to 1
EXAMPLE:
22. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm
What is an Odds Ratio?
If Probability of success of some event is .5
Then the probability of failure is 1- .5 = .5
The odds of success are defined as the ratio of the probability
of success over the probability of failure
Thus, the odds of success are .5/.5 = 1
In other words, the odds of success are 1 to 1
EXAMPLE:
23. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm
What is an Odds Ratio?
The transformation from probability to
odds is a monotonic transformation,
meaning the odds increase as the
probability increases or vice versa.
We know that Probability ranges from 0
and 1. Odds range from 0 and positive
infinity.
Here is a table of the
transformation from probability to odds
25. http://www.ats.ucla.edu/stat/stata/dae/logit.htm
What is an Odds Ratio?
The transformation from odds to log of
odds is the log transformation.
Again this is a monotonic
transformation.
That is to say, the greater the odds, the
greater the log of odds and vice versa.
This table shows the relationship among
the probability, odds and log of odds.
27. http://www.ats.ucla.edu/stat/stata/dae/logit.htm
What is an Odds Ratio?
Why do we take all the trouble doing the transformation from probability to
log odds?
One reason is that it is usually difficult to model a variable which has
restricted range, such as probability.
This transformation is an attempt to get around the restricted range problem.
It maps probability ranging between 0 and 1 to log odds ranging from
negative infinity to positive infinity.
Another reason is that among all of the infinitely many choices of
transformation, the log of odds is one of the easiest to understand and
interpret.
The definition of an odds ratio tells us that for every unit increase in a given Xi
the odds of the Y increases by a factor of that relevant coefficient
28. What is an Odds Ratio?
Here is a Simple Example:
The data set has 200 observations and the outcome variable used will
be hon, indicating if a student is in an honors class or not.
So our p = prob(hon=1).
Imagine we obtained a logit coefficient of -1.12546
29. (1-.245)
Imagine 49 of 200 folks were in honors
So p = 49/200 = .245.
The odds are __.245 =
In other words, the intercept from the model with no predictor
variables is the estimated log odds of being in honors class for the
whole population of interest.
We can also transform the log of the odds back to a probability: p =
exp(-1.12546)/(1+exp(-1.12546)) = .245, if we like.
.3245
and the log of the odds (logit) is log(.3245) = -1.12546.