Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5

Quantitative
Methods
for
Lawyers Class #22
Regression Analysis
Part 5
@ computational
computationallegalstudies.com
professor daniel martin katz danielmartinkatz.com
lexpredict.com slideshare.net/DanielKatz

Interaction Terms
Sometime X1 Impacts Y and X2 Impacts Y but when both X1 and
X2 are Present there is an additional impact (+ or - ) beyond
Y = B0 + (B1 * (X1)) + (B2 * (X2)) + (B3 * (X3)(X2) + ε
Income = B0 + B1 *Gender + B2 * Education + B3* Gender * Education + ε
Our Beta Three Term Gives Us the Effect of Gender and Education
Together
Assuming Gender is Binary in the Model - The Interaction Will
Explore the Differential Effect on Income By Gender

Image From - Thomas Brambor, William Roberts Clark & Matt Golder, Understanding Interaction Models:
Improving Empirical Analyses, 14 Political Analysis 63 (2005)
A Visual Display of
Interaction Terms

Limited Dependent
Variables (LPM and Logit)

Limited Dependent
Sometimes the Dependent Variable of Interest is Limited
Start with the Simplest Case - Binary (0,1)
Lots of Good Examples -
Voting (Republican / Democrat)
Trial (Guilty / Not Guilty)
Vote By Judges/Justices (i.e. Afﬁrm / Reverse)
Hiring (Hired / Not Hired)
Admissions (Admitted / Not Admitted)
etc.

Limited Dependent
Linear Probability Model (a form of Binomial Regression)
Observed variable for each observation takes values which
are either 0 or 1.
The probability of observing a 0 or 1 in any one case is
treated as depending on one or more explanatory variables.
For the linear probability model, this relationship is a
particularly simple one, and allows the model to be ﬁtted by
simple linear regression.

The LPM shows that the ﬁtted values of the equation represent the
probability that Yi=1 for the given values Xi.
The error terms, however, are not normally distributed. Because there
are only two possible outcomes, the error terms are binomially
distributed, because Y can only be 0 and 1.
Problem with LPM:
1) Heteroscedasticity
2) The difﬁculty of interpreting probabilities >1 and < 0
If Yi=0, then 0 =α + β1X1i + β2X2i + εi
If Yi=1, then 1 =α + β1X1i + β2X2i + εi
Linear Probability Model

AN EXAMPLE: Spector and Mazzeo examined the effect of a teaching method
known as PSI on the performance of students in a course, intermediate macro
economics. The question was whether students exposed to the method scored
higher on exams in the class. They collected data from students in two classes,
one in which PSI was used and another in which a traditional teaching method
was employed. For each of 32 students, they gathered the following data:
GRADE — coded 1 if the ﬁnal grade was an A, 0 if the ﬁnal grade was a B or C. 11 sample
members (34.38%) got As and are coded 1.
GPA — Grade point average before taking the class. Observed values range from a low of 2.06
to a high of 4.0 with mean 3.12.
TUCE — the score on an exam given at the beginning of the term to test entering knowledge of
the material. In the sample, TUCE ranges from a low of 12 to a high of 29 with a mean of 21.94.
PSI — a dummy variable indicating the teaching method used (1 = used Psi, 0 = other method).
14 of the 32 sample members (43.75%) are in PSI.
Example From http://www.nd.edu/~rwilliam/stats2/

-1-.50.51
Residuals
0 .2 .4 .6 .8 1
Fitted values
HETEROSKEDASTICITY - The Fitted v.
Residuals should look like a random
scatterplot
ERRORS ARE NOT NORMALLY
DISTRIBUTED - OLS assumes that, for
each set of values for the k independent
variables, the residuals are normally
distributed. This is equivalent to saying
that, for any given value of yhat, the
residuals should be normally distributed.
This assumption is also clearly violated

LINEARITY - the predicted values also suggest that there may be problems with the plausibility of
the model and/or its coefficient estimates.
Probabilities can only range between 0 and 1. However, in OLS, there is no constraint that the
yhat estimates fall in the 0-1 range; indeed, yhat is free to vary between negative infinity and
positive infinity.
Check Out This Output to Help Make the Point Clear:
-.50.51
2 2.5 3 3.5 4
gpa
Fitted values grade

Good News is We Have a
Potential Solution ...
Logistic Regression

vs. Logistic Regression

Logistic Regression
Logistic Regression is used for predicting
the outcome of a binary dependent
variable (a variable which can take only
two possible outcomes, e.g. "yes" vs.
"no" or "success" vs. "failure") based on
one or more predictor variables.
Logistic regression attempts to model the
probability of a "yes/success" outcome
using a linear function of the predictors.

Logistic regression is an approach to prediction, like Ordinary
Least Squares (OLS) regression. However, with logistic
regression, the researcher is predicting a dichotomous outcome.
This situation poses problems for the assumptions of OLS that the
error variances (residuals) are normally distributed.
In logistic regression, a complex formula is required to convert
back and forth from the logistic equation to the OLS-type
equation. The logistic formulas are stated in terms of the
probability that Y = 1, which is referred to as p hat
The probability that Y is 0 is 1 -
http://www.upa.pdx.edu/IOA/newsom/da2/

Logistic Regression
The ln symbol refers to a natural logarithm and B0 + B1X is our
familiar equation for the regression line.
can be computed from the regression equation also. So, if we
know the regression equation, we could, theoretically, calculate
the expected probability that Y = 1 for a given value of X
exp is the exponent function, sometimes written as e. So, the
equation on the right is just the same thing but replacing exp with e
NOTE: e here is not the residual.

Logistic Regression
Because of these complicated algebraic translations, our regression
coefficients are not as easy to interpret.
Our old maxim that b represents “the change in Y with one unit
change in X” is no longer applicable.
Instead, we have to translate using the exponent function.
And, as it turns out, when we do that we have a type of
“coefficient” that is pretty useful.
This coefficient is called the
ODDS RATIO.

http://www.ats.ucla.edu/stat/r/dae/logit.htm
Logistic Regression
In R

http://www.ats.ucla.edu/stat/stata/dae/logit.htm
Logistic
Regression
In R
For every one unit change in gre,
the log odds of admission (versus
non-admission) increases by 0.002.
For a one unit increase in gpa, the
log odds of being admitted to
graduate school increases by
0.804.
The indicator variables for rank have a slightly different interpretation. For example,
having attended an undergraduate institution with rank of 2, versus an institution with
a rank of 1, decreases the log odds of admission by 0.675.

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm
What is an Odds Ratio?
If Probability of success of some event is .8
Then the probability of failure is 1- .8 = .2
The odds of success are deﬁned as the ratio of the probability
of success over the probability of failure
Thus, the odds of success are .8/.2 = 4
In other words, the odds of success are 4 to 1
EXAMPLE:

If Probability of success of some event is .5
Then the probability of failure is 1- .5 = .5
The odds of success are deﬁned as the ratio of the probability
of success over the probability of failure
Thus, the odds of success are .5/.5 = 1
In other words, the odds of success are 1 to 1
EXAMPLE:

The transformation from probability to
odds is a monotonic transformation,
meaning the odds increase as the
probability increases or vice versa.
We know that Probability ranges from 0
and 1. Odds range from 0 and positive
inﬁnity.
Here is a table of the
transformation from probability to odds

Here is a Plot for the range of p less than or equal to .9

The transformation from odds to log of
odds is the log transformation.
Again this is a monotonic
transformation.
That is to say, the greater the odds, the
greater the log of odds and vice versa.
This table shows the relationship among
the probability, odds and log of odds.

Here is a Plot of Relationship Between Log Odds Against Odds

Why do we take all the trouble doing the transformation from probability to
log odds?
One reason is that it is usually difficult to model a variable which has
restricted range, such as probability.
This transformation is an attempt to get around the restricted range problem.
It maps probability ranging between 0 and 1 to log odds ranging from
negative infinity to positive infinity.
Another reason is that among all of the infinitely many choices of
transformation, the log of odds is one of the easiest to understand and
interpret.
The definition of an odds ratio tells us that for every unit increase in a given Xi
the odds of the Y increases by a factor of that relevant coefficient

Here is a Simple Example:
The data set has 200 observations and the outcome variable used will
be hon, indicating if a student is in an honors class or not.
So our p = prob(hon=1).
Imagine we obtained a logit coefﬁcient of -1.12546

(1-.245)
Imagine 49 of 200 folks were in honors
So p = 49/200 = .245.
The odds are __.245 =
In other words, the intercept from the model with no predictor
variables is the estimated log odds of being in honors class for the
whole population of interest.
We can also transform the log of the odds back to a probability: p =
exp(-1.12546)/(1+exp(-1.12546)) = .245, if we like.
.3245
and the log of the odds (logit) is log(.3245) = -1.12546.

Exponentiating
the Coefﬁcients /
Interpreting them
as Odds-Ratios

econ.la.psu.edu/~hbierens/ML_LOGIT.PDF
For a More
Technical Treatment

Daniel Martin Katz
@ computational
computationallegalstudies.com
lexpredict.com
danielmartinkatz.com
illinois tech - chicago kent college of law@

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5

Similar to Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5 (20)

More from Daniel Katz

More from Daniel Katz (20)

Recently uploaded

Recently uploaded (20)

Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5