Raimundo Soto - Catholic University of Chile
ERF Training on Advanced Panel Data Techniques Applied to Economic Modelling
29 -31 October, 2018
Cairo, Egypt
3. MODEL STRUCTURE
• Canonical Model
𝑦𝑖𝑡 = 𝛼𝑖𝑡 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
where
• 𝑦𝑖𝑡 is the phenomenon of interest to be modelled,
• 𝑥𝑖𝑡 represents all observed controls (regressors)
• 𝛼𝑖𝑡 are the individual effects
• 𝜀𝑖𝑡 is the non-systematic part (what we choose not
to model)
3
4. DATA STRUCTURE
• Let us stack the data in the it structure
•
𝑦11
𝑦12
⋮
𝑦1𝑇
𝑦21
𝑦22
⋮
𝑦2𝑇
⋮
⋮
𝑦 𝑁1
⋮
𝑦 𝑁𝑇
4
8. ESTIMATION IS IMPOSSIBLE
• If we allow for 𝛼 𝑁𝑇 there will be 𝑁𝑇 constants
(and, at most, NT observations), not enough
degrees of freedom to estimate the parameters.
• We will restrict ourselves to:
– 𝛼𝑖 , i.e., N constants (that do not change in time)
– 𝜆 𝑡 , i.e., T constants (that do not change by individual)
8
9. POOLED ESTIMATOR
• Let us ignore all heterogeneity
𝑦𝑖𝑡 = 𝛼 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
• The OLS estimator is: 𝛽 = 𝑥𝑖𝑡
′ 𝑥𝑖𝑡
−1 𝑥𝑖𝑡
′ 𝑦𝑖𝑡
• The variance estimator is
𝑉𝑎𝑟 𝛽 = 𝜎𝜀
2
𝑥𝑖𝑡
′
𝑥𝑖𝑡
−1
=
𝜎𝜀
2
𝑉(𝑥𝑖𝑡)
• Note the increase in precision (𝑁𝑥𝑇)
9
10. FIXED EFFECTS ESTIMATOR
• Consider that the heterogeneity is only among
individuals
𝑦𝑖𝑡 = 𝛼𝑖 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
• 𝛼𝑖 represents individual characteristics that are
fixed
• We could use binary (dummy) variables to
represent fixed characteristics
10
12. FIXED EFFECTS ESTIMATOR
• 𝑦𝑖𝑡 = 𝛼𝐷 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
• Note: same slope, different intercept (constant)
• All classic results on econometric estimation
techniques hold: nature of the OLS estimator,
optimality, goodness of fit, and asymptotic
distributions of estimators and tests.
• This estimator is called LSDV least squares dummy
variables.
12
16. FIXED EFFECTS ESTIMATOR
• Example:
– Vial y Soto (2002) revise the opinion that “university
selection tests (PSU) do not predict student performance
(R) in their faculties, only secondary-school marks are
important”.
– When running the pooled regression:
𝑅𝑖𝑡 = 𝛼 + 𝛽𝑃𝑆𝑈𝑖𝑡 + 𝜇𝑖𝑡
The estimated 𝛽 is small, not significant or displays the
“wrong” sign (negative).
16
17. PREDICTED EFFECT OF SELECTION TESTS ON
STUDENTS’ PERFORMANCE
Note: * significant at 10% size
18. PREDICTED EFFECT OF SELECTION TESTS ON
STUDENTS’ PERFORMANCE
Note: * significant at 10% size
21. POOLED VS FIXED EFFECTS ESTIMATORS
𝑃𝑆𝑈
𝑅
21
“Low-quality” faculties “High-quality” faculties
22. POOLED VS FIXED EFFECTS ESTIMATORS
𝑃𝑆𝑈
𝑅
22
“Low-quality” faculties “High-quality” faculties
23. FIXED EFFECTS ESTIMATOR
• LSDV estimator is unfeasible if N is too large
– HIECS has 24,000 households
• Recall that constants in regressions only take away the
means of the variables
• It would be much simpler to eliminate the means of the
variables and avoid specifying 24,000 dummy variables
23
24. FIXED EFFECTS ESTIMATOR
𝑦𝑖𝑡 = 𝛼𝑖 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
• Let us take expected value for each individual “i” in time:
𝐸𝑖 𝑦𝑖𝑡 = 𝐸𝑖 𝛼𝑖 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
𝑦𝑖 = 𝛼𝑖 + 𝑥𝑖 𝛽
• and subtract from the original model to eliminate 𝛼𝑖:
𝑦𝑖𝑡 − 𝑦𝑖 = 𝛼𝑖 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡 − 𝛼𝑖 − 𝑥𝑖 𝛽
𝑦𝑖𝑡 − 𝑦𝑖 = 𝑥𝑖𝑡 − 𝑥𝑖 𝛽 + 𝜀𝑖𝑡
24
25. FIXED EFFECTS ESTIMATOR
𝑦𝑖𝑡 − 𝑦𝑖 = 𝑥𝑖𝑡 − 𝑥𝑖 𝛽 + 𝜀𝑖𝑡
• This is a very simple estimation, without the problems
derived from dimensionality.
• Obviously, we cannot estimate 𝛼𝑖, but they are easily
recovered as:
𝛼𝑖 = 𝑦𝑖 − 𝑥𝑖 𝛽
25
30. FIXED EFFECTS ESTIMATOR
𝑦𝑖𝑡 − 𝑦𝑖 = 𝑥𝑖𝑡 − 𝑥𝑖 𝛽 + 𝜀𝑖𝑡
• This estimator uses only information within each group
and it is therefore called within-groups estimator
• Let us obtain certain useful “sums” in order to better
understand the nature of estimators.
30
35. WITHIN-GROUPS ESTIMATOR
• The variance of the within-groups estimator is
𝑉𝑎𝑟 𝛽 𝑤 =
𝜎2
𝑖=1
𝑁
𝑡=1
𝑇
𝑥𝑖𝑡 − 𝑥𝑖 ′ 𝑥𝑖 − 𝑥 + 𝑖=1
𝑁
𝑡=1
𝑇
𝑥𝑖 − 𝑥 ′ 𝑥𝑖 − 𝑥
𝑉𝑎𝑟 𝛽 𝑤 =
𝜎2
𝑆
𝑝
𝑥𝑥
− 𝑖=1
𝑁
𝑡=1
𝑇
𝑥𝑖 − 𝑥 ′ 𝑥𝑖 − 𝑥
• Therefore, this variance is larger than that of the pooled
estimator
• The within-groups estimator is less precise than the pooled
estimator
35
36. LET US SEE THIS IN PRACTICE
• Open Stata
• Open file ERF_Continuous Static.do
– Declare Panel Data and Variables
• xtset
– Panel Data Analysis: commands xt
• xtdes
• xtsum
– Panel Data Regression
• xtreg
• Let us check the estimation results
36
37. F test that all u_i=0: F(162, 5270) = 98.59 Prob > F = 0.0000
rho .95557807 (fraction of variance due to u_i)
sigma_e .32607847
sigma_u 1.5123647
_cons -7.844568 .2477214 -31.67 0.000 -8.330205 -7.358932
l_popt .0731808 .0285325 2.56 0.010 .0172453 .1291163
l_infl2 -.0281481 .004758 -5.92 0.000 -.0374757 -.0188204
l_realgdp .3846652 .0153326 25.09 0.000 .354607 .4147234
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.9200 Prob > F = 0.0000
F(3,5270) = 859.53
overall = 0.0036 max = 55
between = 0.0140 avg = 33.3
R-sq: within = 0.3285 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Fixed-effects (within) regression Number of obs = 5436
_cons 3.315492 .0848041 39.10 0.000 3.149242 3.481742
l_popt .0014429 .0061381 0.24 0.814 -.0105903 .0134762
l_infl2 -.1650409 .0078825 -20.94 0.000 -.1804937 -.149588
l_realgdp -.0086728 .0036681 -2.36 0.018 -.0158636 -.0014819
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2454.44502 5435 .451599819 Root MSE = .64482
Adj R-squared = 0.0793
Residual 2258.61346 5432 .415797764 R-squared = 0.0798
Model 195.831563 3 65.2771875 Prob > F = 0.0000
F( 3, 5432) = 156.99
Source SS df MS Number of obs = 5436
37
38. F test that all u_i=0: F(162, 5270) = 98.59 Prob > F = 0.0000
rho .95557807 (fraction of variance due to u_i)
sigma_e .32607847
sigma_u 1.5123647
_cons -7.844568 .2477214 -31.67 0.000 -8.330205 -7.358932
l_popt .0731808 .0285325 2.56 0.010 .0172453 .1291163
l_infl2 -.0281481 .004758 -5.92 0.000 -.0374757 -.0188204
l_realgdp .3846652 .0153326 25.09 0.000 .354607 .4147234
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.9200 Prob > F = 0.0000
F(3,5270) = 859.53
overall = 0.0036 max = 55
between = 0.0140 avg = 33.3
R-sq: within = 0.3285 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Fixed-effects (within) regression Number of obs = 5436
_cons 3.315492 .0848041 39.10 0.000 3.149242 3.481742
l_popt .0014429 .0061381 0.24 0.814 -.0105903 .0134762
l_infl2 -.1650409 .0078825 -20.94 0.000 -.1804937 -.149588
l_realgdp -.0086728 .0036681 -2.36 0.018 -.0158636 -.0014819
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2454.44502 5435 .451599819 Root MSE = .64482
Adj R-squared = 0.0793
Residual 2258.61346 5432 .415797764 R-squared = 0.0798
Model 195.831563 3 65.2771875 Prob > F = 0.0000
F( 3, 5432) = 156.99
Source SS df MS Number of obs = 5436
Total
Observations
38
39. F test that all u_i=0: F(162, 5270) = 98.59 Prob > F = 0.0000
rho .95557807 (fraction of variance due to u_i)
sigma_e .32607847
sigma_u 1.5123647
_cons -7.844568 .2477214 -31.67 0.000 -8.330205 -7.358932
l_popt .0731808 .0285325 2.56 0.010 .0172453 .1291163
l_infl2 -.0281481 .004758 -5.92 0.000 -.0374757 -.0188204
l_realgdp .3846652 .0153326 25.09 0.000 .354607 .4147234
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.9200 Prob > F = 0.0000
F(3,5270) = 859.53
overall = 0.0036 max = 55
between = 0.0140 avg = 33.3
R-sq: within = 0.3285 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Fixed-effects (within) regression Number of obs = 5436
_cons 3.315492 .0848041 39.10 0.000 3.149242 3.481742
l_popt .0014429 .0061381 0.24 0.814 -.0105903 .0134762
l_infl2 -.1650409 .0078825 -20.94 0.000 -.1804937 -.149588
l_realgdp -.0086728 .0036681 -2.36 0.018 -.0158636 -.0014819
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2454.44502 5435 .451599819 Root MSE = .64482
Adj R-squared = 0.0793
Residual 2258.61346 5432 .415797764 R-squared = 0.0798
Model 195.831563 3 65.2771875 Prob > F = 0.0000
F( 3, 5432) = 156.99
Source SS df MS Number of obs = 5436
Total
Groups
39
40. F test that all u_i=0: F(162, 5270) = 98.59 Prob > F = 0.0000
rho .95557807 (fraction of variance due to u_i)
sigma_e .32607847
sigma_u 1.5123647
_cons -7.844568 .2477214 -31.67 0.000 -8.330205 -7.358932
l_popt .0731808 .0285325 2.56 0.010 .0172453 .1291163
l_infl2 -.0281481 .004758 -5.92 0.000 -.0374757 -.0188204
l_realgdp .3846652 .0153326 25.09 0.000 .354607 .4147234
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.9200 Prob > F = 0.0000
F(3,5270) = 859.53
overall = 0.0036 max = 55
between = 0.0140 avg = 33.3
R-sq: within = 0.3285 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Fixed-effects (within) regression Number of obs = 5436
_cons 3.315492 .0848041 39.10 0.000 3.149242 3.481742
l_popt .0014429 .0061381 0.24 0.814 -.0105903 .0134762
l_infl2 -.1650409 .0078825 -20.94 0.000 -.1804937 -.149588
l_realgdp -.0086728 .0036681 -2.36 0.018 -.0158636 -.0014819
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2454.44502 5435 .451599819 Root MSE = .64482
Adj R-squared = 0.0793
Residual 2258.61346 5432 .415797764 R-squared = 0.0798
Model 195.831563 3 65.2771875 Prob > F = 0.0000
F( 3, 5432) = 156.99
Source SS df MS Number of obs = 5436
Group
Characteristics
40
41. F test that all u_i=0: F(162, 5270) = 98.59 Prob > F = 0.0000
rho .95557807 (fraction of variance due to u_i)
sigma_e .32607847
sigma_u 1.5123647
_cons -7.844568 .2477214 -31.67 0.000 -8.330205 -7.358932
l_popt .0731808 .0285325 2.56 0.010 .0172453 .1291163
l_infl2 -.0281481 .004758 -5.92 0.000 -.0374757 -.0188204
l_realgdp .3846652 .0153326 25.09 0.000 .354607 .4147234
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.9200 Prob > F = 0.0000
F(3,5270) = 859.53
overall = 0.0036 max = 55
between = 0.0140 avg = 33.3
R-sq: within = 0.3285 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Fixed-effects (within) regression Number of obs = 5436
_cons 3.315492 .0848041 39.10 0.000 3.149242 3.481742
l_popt .0014429 .0061381 0.24 0.814 -.0105903 .0134762
l_infl2 -.1650409 .0078825 -20.94 0.000 -.1804937 -.149588
l_realgdp -.0086728 .0036681 -2.36 0.018 -.0158636 -.0014819
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2454.44502 5435 .451599819 Root MSE = .64482
Adj R-squared = 0.0793
Residual 2258.61346 5432 .415797764 R-squared = 0.0798
Model 195.831563 3 65.2771875 Prob > F = 0.0000
F( 3, 5432) = 156.99
Source SS df MS Number of obs = 5436
Different
estimates
41
42. F test that all u_i=0: F(162, 5270) = 98.59 Prob > F = 0.0000
rho .95557807 (fraction of variance due to u_i)
sigma_e .32607847
sigma_u 1.5123647
_cons -7.844568 .2477214 -31.67 0.000 -8.330205 -7.358932
l_popt .0731808 .0285325 2.56 0.010 .0172453 .1291163
l_infl2 -.0281481 .004758 -5.92 0.000 -.0374757 -.0188204
l_realgdp .3846652 .0153326 25.09 0.000 .354607 .4147234
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.9200 Prob > F = 0.0000
F(3,5270) = 859.53
overall = 0.0036 max = 55
between = 0.0140 avg = 33.3
R-sq: within = 0.3285 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Fixed-effects (within) regression Number of obs = 5436
_cons 3.315492 .0848041 39.10 0.000 3.149242 3.481742
l_popt .0014429 .0061381 0.24 0.814 -.0105903 .0134762
l_infl2 -.1650409 .0078825 -20.94 0.000 -.1804937 -.149588
l_realgdp -.0086728 .0036681 -2.36 0.018 -.0158636 -.0014819
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2454.44502 5435 .451599819 Root MSE = .64482
Adj R-squared = 0.0793
Residual 2258.61346 5432 .415797764 R-squared = 0.0798
Model 195.831563 3 65.2771875 Prob > F = 0.0000
F( 3, 5432) = 156.99
Source SS df MS Number of obs = 5436
Different
Fit
42
43. BETWEEN-GROUPS ESTIMATOR
• Recall that the regression model goes through the
averages (mean) of variables
𝐸𝑖 𝑦𝑖𝑡 = 𝐸𝑖 𝛼𝑖 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
• We can run a regression on the means of each
group
𝑦𝑖 = 𝛼 + 𝑥𝑖 𝛽
43
49. BETWEEN-GROUPS ESTIMATOR
𝛽 𝑝 = 𝐹 𝑤 𝛽 𝑤 + 𝐼 − 𝐹 𝑤 𝛽 𝑏
• The pooled estimator is a weighted average of the
between and within-group estimators
• Weights depend on the information content of the
data:
– If groups are very similar, information comes from
individuals within groups
– If groups are very different, information comes from
differences between groups
49
50. RESULTS BETWEEN-GROUPS ESTIMATOR
50
_cons 2.450055 .4769026 5.14 0.000 1.508174 3.391936
l_popt .008753 .029578 0.30 0.768 -.0496633 .0671694
l_infl2 -.4064124 .0650977 -6.24 0.000 -.53498 -.2778448
l_realgdp -.0062972 .0168565 -0.37 0.709 -.0395887 .0269942
l_money Coef. Std. Err. t P>|t| [95% Conf. Interval]
sd(u_i + avg(e_i.))= .5167678 Prob > F = 0.0000
F(3,159) = 14.81
overall = 0.0787 max = 55
between = 0.2185 avg = 33.3
R-sq: within = 0.0155 Obs per group: min = 4
Group variable: idwbcode Number of groups = 163
Between regression (regression on group means) Number of obs = 5436
. xtreg l_money l_realgdp l_infl2 l_popt, be
51. ESTIMATING THE VARIANCE OF RESIDUALS
• Compute the sample residuals as:
𝜀𝑖𝑡 = 𝑦𝑖𝑡 − 𝛼𝑖 − 𝑥𝑖𝑡 𝛽
• The residual variance estimator is simply:
𝜎2
=
𝑖=1
𝑁
𝑡=1
𝑇
𝑦𝑖𝑡 − 𝛼𝑖 − 𝑥𝑖𝑡 𝛽
2
𝑁𝑇 − 𝑁 − 𝐾
51
52. HYPOTHESIS TESTING
• Having the estimated parameters and the residual
variance estimator hypotheses testing is
straightforward
– Individual parameter tests distribute t in small samples
and Normal in large samples
– Multiple parameter tests distribute 𝜒2 or 𝐹(𝑚, 𝑛)
52
53. TWO-WAY FIXED EFFECTS ESTIMATOR
• Model with fixed individual effects and fixed time
effects
𝑦𝑖𝑡 = 𝛼𝑖 + 𝜆 𝑡 + 𝑥𝑖𝑡 𝛽 + 𝜀𝑖𝑡
where
• 𝜆 𝑡 is a time effect affecting equally all individuals
• 𝛼𝑖 is, again, an individual effect for all times
53