1. Prof. Chitwan Lalji
Economics Area
Indian Institute of Management Kozhikode
*Text Books: Wooldridge, J. M. (2016). Introductory econometrics: A modern approach. Nelson Education.
Enders, Walter (2005). Applied Econometric Time Series, 4th ed., Wiley.
Econometric Applications for
Research*
3. What is econometrics..?
• It is the use of statistical methods to analyze economic data
• It is science/art of testing economic theories
• It is the process of fitting mathematical economic models to real-world data
• It is a set of tools used for forecasting future values of economic variables, such as a
firm’s sales, the overall growth of the economy, or stock prices.
• Evaluating and implementing government and business policy
• Science and art of using historical data to make numerical, or quantitative, policy
recommendations in government and business
4. Steps in econometrics analysis?
1. Economic Theory
2. Econometric model
3. Hypothesis Testing
5. Economic Theory
Economic Theory
• Demand Theory – shows the relationship between price and quantity
of a good that consumers are willing to buy at a given price, holding
constant other factors that might affect the quantity demanded.
QD = QD(P)
• Others factors – The quantity that consumers are willing to buy also
depends on their income, price of related goods, etc.
QD = f(Price of good, Income, Price of related goods, etc.)
6. Econometric Model
Economic Theory
QD = f(Price of good, Income, Price of related goods, etc.)
Econometric Model
QD = α + β(Price of good) + γ(Income) + δ(Price of related goods) + ε
Unobserved factors, such
as tastes, habits,
expectations, etc.
Dependent variable,
explained variable,
response variable,…
Independent variable(/s),
explanatory variable(/s),
regressor(/s),…
7. Another example
Model of job training and worker productivity
– What is effect of additional training on worker productivity?
– Formal economic theory not really needed to derive equation:
– Other factors may also be relevant
Hourly wage
Years of formal
education Years of work-
force experience
Weeks spent
in job training
8. Another example
Econometric model of job training and worker productivity
• Most of econometrics deals with the specification of the error
• Econometric models may be used for hypothesis testing
– For example, the parameter represents effect of training on wage
– How large is this effect? Is it different from zero?
Hourly wage Years of formal
education
Years of work-
force experience
Weeks spent
in job training
Unobserved deter-
minants of the wage
e.g. innate ability,
quality of education,
family background …
11. Classification of Data
Qualitative data and Quantitative data
Primary data and secondary data
Cross-sectional, pooled-cross-sectional, time series data and panel data
19. Causation effect
Definition of causal effect of x on y:
"How does variable y change if variable x is changed
but all other relevant factors are held constant“
21. Simple Linear Regression
Definition of the simple linear regression model
Dependent variable,
explained variable,
response variable,…
Independent variable,
explanatory variable,
regressor,…
Error term,
disturbance,
unobservables,…
Intercept Slope parameter
"Explains variable y in terms of variable x"
22. Simple Linear Regression
• Conditional mean independence assumption
The explanatory variable must not
contain information about the mean
of the unobserved factors
24. Simple Linear Regression
Properties of OLS on any sample of data
• Fitted values and residuals
• Algebraic properties of OLS regression
Fitted or predicted values Deviations from regression line (= residuals)
Deviations from
regression line sum up
to zero
Correlation between
deviations and regressors
is zero
Sample averages of y
and x lie on regression
line
27. Simple Linear Regression
What does "as good as possible" mean?
• Regression residuals
• Minimize sum of squared regression residuals
• Ordinary Least Squares (OLS) estimates
28. Simple Linear Regression
• Goodness-of-Fit
• Variation
"How well does the explanatory variable explain the dependent variable?"
29. Simple Linear Regression
• Decomposition of total variation
• Goodness-of-fit measure (R-squared)
Total
variation
Explained
part
Unexplained
part
R-squared measures the fraction
of the total variation that is
explained by the regression
36. Assumptions of CLRM
• Linear in Parameter
• Random Sampling
• Sample variation on the explanatory variable (not all the same value)
• Zero conditional mean: E(u|x) = 0
Cov(x,u)=0
• Variance of the unobservable (u) conditional on x, is constant
Homoskedasticity or same variance assumption
The value of the explanatory variable must
contain no information about the
variability of the unobserved factors
38. Log and semi log form
• Incorporating nonlinearities: Semi-logarithmic form
• Regression of log wages on years of eduction
• This changes the interpretation of the regression coefficient:
Natural logarithm of wage
Percentage change of
wage
… if years of education
are increased by one year
39. Log and semi log form
• Incorporating nonlinearities: Log-logarithmic form
• CEO salary and firm sales
• This changes the interpretation of the regression coefficient:
Natural logarithm of CEO salary
Percentage change of salary
… if sales increase by 1 %
Natural logarithm of his/her firm‘s sales
Logarithmic changes are
always percentage changes
40. Example on STATA
Use the following data
• CEOSAL1 data
Please note:
The datasets and do files will be made available in the virtual classroom/moodle.
41. Assumptions of CLRM
• Linear in Parameter
• Random Sampling
• Sample variation on the explanatory variable (not all the same value)
• Zero conditional mean: E(u|x) = 0
Cov(x,u)=0
• Variance of the unobservable (u) conditional on x, is constant
Homoskedasticity or same variance assumption
The value of the explanatory variable must
contain no information about the
variability of the unobserved factors
1. Sample of individuals, households, firms, cities, states, countries, or other units of interest at a given point of time/in a given period
2. Cross-sectional observations are more or less independent
3. Pure random sampling from a population
4. Represent the population!
5. Ordering of observations does not matter
6. Typical applications: applied microeconomics
1. Observations of a variable or several variables over time
2. For example, stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, automobile sales, …
3. Time series observations are typically serially correlated
4. Ordering of observations conveys important information
5. Data frequency: daily, weekly, monthly, quarterly, annually, …
6. Typical features of time series: trends and seasonality
7. Typical applications: applied macroeconomics and finance
1. Two or more cross sections are combined in one data set
2. Cross sections are drawn independently of each other
3. Pooled cross sections often used to evaluate policy changes
Example:
Evaluate effect of change in property taxes on house prices
- Random sample of house prices for the year 1993
- A new random sample of house prices for the year 1995
- Compare before/after (1993: before reform, 1995: after reform)
1. The same cross-sectional units are followed over time
2. Panel data have a cross-sectional and a time series dimension
3. Panel data can be used to account for time-invariant unobservables
4. Panel data can be used to model lagged responses
Example:
City crime statistics; each city is observed in two years
Time-invariant unobserved city characteristics may be modeled
Effect of police on crime rates may exhibit time lag
Correlation is the degree of association or relationship between two variables.
Causation refers to 1 variable causing the other variable, how one affects impacts the other.
Correlation shows the relationship between the two variables, while regression allows us to see how one affects the other, keeping other things constant – citrus paribus.
Example: more ice cream consumption and high deaths due to heart diseases >>>> caused by temperature/heat and not more ice cream consumption. Regressions helps to find the relationship between two variables, keeping other things constant.