SlideShare uma empresa Scribd logo
1 de 81
Baixar para ler offline
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Econometrics of High-Dimensional Sparse
Models
(p much larger than n)
Victor Chernozhukov Christian Hansen
NBER, July 2013
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline for Econometric Theory of “Big Data”
Part I.
1. High-Dimensional Sparse Models (HDSM)
◮ Models
◮ Motivating Examples
2. Estimation of Regression Functions via Penalization and
Selection Methods
◮ ℓ1-penalization or LASSO methods
◮ post-selection estimators or Post-Lasso methods
Part II.
3. Estimation and Inference in IV regression with Many Instruments
4. Estimation and Inference on Treatment Effects with Many
Controls in a Partially Linear Model.
5. Generalizations.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Materials
1. A semi-review article:
◮ Belloni, Chernozhukov, and Hansen, ”Inference in
High-Dimensional Sparse Econometric Models”, 2010, Advances
in Economics and Econometrics, 10th World Congress.
http://arxiv.org/pdf/1201.0220v1.pdf
2. Research Articles Listed in References.
3. Stata and or Matlab codes are available for most empirical
examples via links to be posted at www.mit.edu/˜vchern/.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Part I.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
1. High-Dimensional Sparse Econometric Model
HDSM. A response variable yi obeys
yi = x′
i β0 + ǫi , ǫi ∼ (0, σ2
), i = 1, ..., n
where xi are p-dimensional; w.l.o.g. we normalize each regressor:
xi = (xij , j = 1, ..., p)′
,
1
n
n
i=1
x2
ij = 1.
p possibly much larger than n.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
1. High-Dimensional Sparse Econometric Model
HDSM. A response variable yi obeys
yi = x′
i β0 + ǫi , ǫi ∼ (0, σ2
), i = 1, ..., n
where xi are p-dimensional; w.l.o.g. we normalize each regressor:
xi = (xij , j = 1, ..., p)′
,
1
n
n
i=1
x2
ij = 1.
p possibly much larger than n.
The key assumption is sparsity, the number of relevant regressors is
much smaller than the sample size:
s := β0 0 =
p
j=1
1{β0j = 0} ≪ n,
This generalizes the traditional parametric framework used in
empirical economics, by allowing the identity
T = {j ∈ {1, ..., p} : β0j = 0}
of the relevant s regressors be unknown.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Motivation for high p
◮ transformations of basic regressors zi,
xi = (P1(zi ), ..., Pp(zi ))′
,
◮ for example, in wage regressions, Pj s are polynomials or B-splines
in education and experience.
◮ and/or simply a very large list of regressors,
◮ a list of country characteristics in cross-country growth regressions
(Barro & Lee),
◮ housing characteristics in hedonic regressions (American Housing
Survey)
◮ price and product characteristics at the point of purchase (scanner
data, TNS).
◮ judge characteristics in the analysis of economic impacts of the
eminent domain
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
From Sparsity to Approximate Sparsity
◮ The key assumption is that the number of non-zero regression
coefficients is smaller than the sample size:
s := β0 0 =
p
j=1
1{β0j = 0} ≪ n.
◮ The idea is that a low-dimensional (s-dimensional) submodel
accurately approximates the full p-dimensional model. The
approximation error is in fact zero.
◮ The approximately sparse model allows for a non-zero
approximation error
yi = x′
i β0 + ri
regression function
+ǫi ,
that is not bigger than the size of estimation error, namely as
n → ∞
s log p
n
→ 0,
1
n
n
i=1
r2
i σ
s
n
→ 0.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
◮ Example:
yi =
∞
j=1
θj xj + ǫi , |θj | j−a
, a > 1/2,
has s = σn1/2a
, because we need only s regressors with largest
coefficients to have
1
n
n
i=1
r2
i σ
s
n
.
◮ The approximately sparse model generalizes the exact sparse
model, by letting in approximation error.
◮ This model also generalizes the traditional series/sieve
regression model by letting the identity
T = {j ∈ {1, ..., p} : β0j = 0}
of the most important s series terms be unknown.
◮ All results we present are for the approximately sparse model.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Example 1: Series Models of Wage Function
◮ In this example, abstract away from the estimation questions,
using population/census data. In order to visualize the idea of
the approximate sparsity, consider a contrived example.
◮ Consider a series expansion of the conditional expectation
E[yi |zi ] of wage yi given education zi .
◮ A conventional series approximation to the regression function is,
for example,
E[yi |zi ] = β1 + β2P1(zi ) + β3P2(zi ) + β4P3(zi ) + ri
where P1, ..., P3 are low-order polynomials (or other terms).
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
8 10 12 14 16 18 20
6.06.57.0
education
wage
Traditional Approximation of Expected Wage Function
using Polynomials
◮ In the figure, true regression function E[yi|zi ] computed using
U.S. Census data, year 2000, prime-age white men.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
◮ Can we a find a much better series approximation, with the same
number of parameters?
◮ Yes, if we can capture the oscillatory behavior of E[yi|zi ] in some
regions.
◮ We consider a “very long” expansion
E[yi |zi] =
p
j=1
β0j Pj (zi) + r′
i ,
with polynomials and dummy variables, and shop around just for
a few terms that capture “oscillations”.
◮ We do this using the LASSO – which finds a parsimonious model
by minimizing squared errors, while penalizing the size of the
model through by the sum of absolute values of coefficients. In
this example we can also find the “right” terms by “eye-balling”.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
8 10 12 14 16 18 20
6.06.57.0
education
wage
Lasso Approximation of Expected Wage Function
using Polynomials and Dummies
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
8 10 12 14 16 18 20
6.06.57.0
education
wage
Traditional vs Lasso Approximation
of Expected Wage Functions
with Equal Number of Parameters
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Errors of Traditional and Lasso-Based Sparse Approximations
RMSE Max Error
Conventional Series Approximation 0.135 0.290
Lasso-Based Series Approximation 0.031 0.063
Notes.
1. Conventional approximation relies on low order polynomial with 4 parameters.
2. Sparse approximation relies on a combination of polynomials and dummy
variables and also has 4 parameters.
Conclusion. Examples show how the new framework nests and expands the
traditional parsimonious modelling framework used in empirical economics.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
2. Estimation of Regression Functions via
L1-Penalization and Selection
◮ When p is large, good idea to do selection or penalization to prevent
overfitting. Ideally, would like to try to minimize a BIC type criterion
function
1
n
n
i=1
[yi − x′
i β]2
+ λ β 0, β 0 =
p
j=1
1{β0j = 0}
but this is not computationally feasible – NP hard.
◮ A solution (Frank and Friedman, 94, Tibshirani, 96) is to replace the ℓ0
”norm” by a closest convex function – the ℓ1-norm. LASSO estimator β
then minimizes
1
n
n
i=1
[yi − x′
i β]2
+ λ β 1, β 1 =
p
j=1
|βj |.
Globally convex, computable in polynomial time. Kink in the penalty
induces the solution ˆβ to have lots of zeroes, so often used as a model
selection device.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The LASSO
◮ The rate-optimal choice of penalty level is
λ =σ · 2 2 log(pn)/n.
(Bickel, Ritov, Tsybakov, Annals of Statistics, 2009).
◮ The choice relies on knowing σ, which may be apriori hard to estimate
when p ≫ n.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The LASSO
◮ The rate-optimal choice of penalty level is
λ =σ · 2 2 log(pn)/n.
(Bickel, Ritov, Tsybakov, Annals of Statistics, 2009).
◮ The choice relies on knowing σ, which may be apriori hard to estimate
when p ≫ n.
◮ Can estimate σ by iterating from a conservative starting value (standard
deviation around the sample mean) , see Belloni and Chernozhukov
(2009, Bernoulli). Very simple.
◮ Cross-validation is often used as well and performs well in Monte-Carlo,
but its theoretical validity is an open question in the settings p ≫ n.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The
√
LASSO
◮ A way around is the
√
LASSO estimator minimizing (Belloni,
Chernozhukov, Wang, 2010, Biometrika)
1
n
n
i=1
[yi − x′
i β]2 + λ β 1,
◮ The rate-optimal penalty level is pivotal – independent of σ:
λ = 2 log(pn)/n.
◮ Tuning-Free. Globally convex, polynomial time computable via
conic programming.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
Q(β0) + ∇Q(β0)′
β
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
λ β 1
λ = ∇Q(β0) ∞
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
λ β 1
λ > ∇Q(β0) ∞
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
λ > ∇Q(β0) ∞
Q(β) + λ β 1
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics
◮ LASSO (and variants) will successfully “zero out” lots of irrelevant
regressors, but it won’t be perfect, (no procedure can distinguish
β0j = C/
√
n from 0, and so model selection mistakes are bound
to happen).
◮ λ is chosen to dominate the norm of the subgradient:
P(λ > ∇Q(β0) ∞) → 1,
and the choices of λ mentioned precisely implement that.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics
◮ LASSO (and variants) will successfully “zero out” lots of irrelevant
regressors, but it won’t be perfect, (no procedure can distinguish
β0j = C/
√
n from 0, and so model selection mistakes are bound
to happen).
◮ λ is chosen to dominate the norm of the subgradient:
P(λ > ∇Q(β0) ∞) → 1,
and the choices of λ mentioned precisely implement that.
◮ In the case of
√
LASSO,
∇Q(β0)
∞
= max
1≤j≤p
|1
n
n
i=1 ǫixij |
1
n
n
i=1 ǫ2
i
does not depend on σ.
◮ Hence for
√
LASSO λ does not depend on σ.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Dealing with Heteroscedasticity∗
Heteroscedastic Model:
yi = x′
i β0 + ri + ǫi , ǫi ∼ (0, σ2
i ).
◮ Heteroscedastic forms of Lasso – Belloni, Chen, Chernozhukov, Hansen
(Econometrica, 2012). Fully data-driven.
β ∈ arg min
β∈Rp
1
n
n
i=1
[yi − x′
i β]2
+ λ Ψβ 1, λ = 2 2 log(pn)/n
Ψ = diag[(n−1
n
i=1
[x2
ij ǫ2
i ])1/2
+ op(1), j = 1, ..., p]
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Dealing with Heteroscedasticity∗
Heteroscedastic Model:
yi = x′
i β0 + ri + ǫi , ǫi ∼ (0, σ2
i ).
◮ Heteroscedastic forms of Lasso – Belloni, Chen, Chernozhukov, Hansen
(Econometrica, 2012). Fully data-driven.
β ∈ arg min
β∈Rp
1
n
n
i=1
[yi − x′
i β]2
+ λ Ψβ 1, λ = 2 2 log(pn)/n
Ψ = diag[(n−1
n
i=1
[x2
ij ǫ2
i ])1/2
+ op(1), j = 1, ..., p]
◮ Penalty loadings Ψ are estimated iteratively:
1. initialize, e.g., ˆǫi = yi − ¯y, Ψ = diag[(n−1 n
i=1[x2
ij ˆǫ2
i ])1/2
, j = 1, ..., p]
2. obtain β, update
ˆǫi = yi − x′
i
ˆβ, Ψ = diag[(n−1 n
i=1[x2
ij ˆǫ2
i ])1/2
, j = 1, ..., p]
3. iterate on the previous step.
◮ For Heteroscedastic forms of
√
LASSO, see Belloni, Chernozhukov,
Wang (Annals of Statistics, R & R, 2011).
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Probabilistic intuition for the latter construction ∗
Construction makes the ”noise” in Kuhn-Tucker conditions self-normalized,
and λ dominates the ”noise”.
Union bounds and the moderate deviation theory for self-normalized sums
(Jing, Shao, Wang, Ann. Prob., 2005) imply that:
P max
1≤j≤p
2|1
n
n
i=1[ǫi xij ]|
1
n
n
i=1 ǫ2
i x2
ij
”max norm of gradient”
≤ λ
penalty level
= 1 − O(1/n).
under the condition that
log p = o(n1/3
)
if for all i ≤ n, j ≤ p
E[x3
ij ǫ3
i ] ≤ K.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Some properties
◮ Due to kink in the penalty, LASSO (and variants) will successfully
“zero out” lots of irrelevant regressors (but don’t expect it to be
perfect).
◮ Lasso procedures bias/shrink the non-zero coefficient estimates
towards zero.
◮ The latter property motivates the use of Least squares after
Lasso, or Post-Lasso.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Post-Model Selection Estimator, or Post-LASSO
Define the post-selection, e.g., post-LASSO estimator as follows:
1. In step one, select the model using the LASSO or
√
LASSO.
2. In step two, apply ordinary LS to the selected model.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Regularity Condition on X∗
◮ A simple sufficient condition is as follows.
Condition RSE. Take any C > 1. With probability approaching 1, matrix
M =
1
n
n
i=1
xi x′
i ,
obeys
0 < K ≤ min
δ 0≤sC
δ′
Mδ
δ′δ
≤ max
δ 0≤sC
δ′
Mδ
δ′δ
≤ K′
< ∞. (1)
◮ This holds under i.i.d. sampling if E[xi x′
i ] has eigenvalues bounded
away from zero and above, and:
– xi has light tails (i.e., log-concave) and s log p = o(n);
– or bounded regressors maxij |xij | ≤ K and s(log p)5
= o(n).
Ref. Rudelson and Vershynin (2009).
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Result 1: Rates for LASSO/
√
LASSO
Theorem (Rates)
Under practical regularity conditions– including errors having 4 + δ bounded
moments and log p = o(n1/3
) – with probability approaching 1,
ˆβ − β0
1
n
n
i=1
[x′
i
ˆβ − x′
i β0]2 σ
s log(n ∨ p)
n
◮ The rate is close to the “oracle” rate s/n, obtainable when we know
the “true” model T; p shows up only through log p.
◮ References.
- LASSO — Bickel, Ritov, Tsybakov (Annals of Statistics 2009), Gaussian
errors.
- heteroscedastic LASSO – Belloni, Chen, Chernozhukov, Hansen
(Econometrica 2012), non-Gaussian errors.
-
√
LASSO – Belloni, Chernozhukov and Wang (Biometrika, 2010),
non-Gaussian errors.
- heteroscedastic
√
LASSO – Belloni, Chernozhukov and Wang (Annals R
&R , 2010), non-Gaussian errors.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Result 2: Post-Model Selection Estimator
In the rest of the talk LASSO means all of its variants, especially their
heteroscedastic versions.
Recall that the post-LASSO estimator is defined as follows:
1. In step one, select the model using the LASSO.
2. In step two, apply ordinary LS to the selected model.
◮ Lasso (or any other method) is not perfect at model selection –
might include “junk”, exclude some relevant regressors.
◮ Analysis of all post-selection methods in this lecture accounts for
imperfect model selection .
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Result 2: Post-Selection Estimator
Theorem (Rate for Post-Selection Estimator)
Under practical conditions, with probability approaching 1,
ˆβPL − β0
1
n
n
i=1
[x′
i
ˆβPL − x′
i β0]2 σ
s
n
log(n ∨ p),
Under some further exceptional cases faster, up to σ s
n .
◮ Even though LASSO does not in general perfectly select the
relevant regressors, Post-LASSO performs at least as well.
◮ This result was first derived for least squares by
◮ Belloni and Chernozhukov (Bernoulli, 2009).
◮ Extended to heteroscedastic, non-Gaussian case in
◮ Belloni, Chen, Chernozhukov, Hansen (Econometrica, 2012).
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo
◮ In this simulation we used
s = 6, p = 500, n = 100
yi = x′
i β0 + ǫi , ǫi ∼ N(0, σ2
),
β0 = (1, 1, 1/2, 1/3, 1/4, 1/5, 0, . . . , 0)′
xi ∼ N(0, Σ), Σij = (1/2)|i−j|
, σ2
= 1
◮ Ideal benchmark: Oracle estimator which runs OLS of yi on
xi1, ..., xi6. This estimator is not feasible outside Monte-Carlo.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo Results: Prediction Error
RMSE: [E[x′
i (ˆβ − β0)]2
]1/2
n = 100, p = 500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
LASSO Post-LASSO Oracle
Estimation Risk
Lasso is not perfect at model selection, but does find good models, allowing
Lasso and Post-Lasso to perform at the near-Oracle level.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo Results: Bias
Norm of the Bias E ˆβ − β0
n = 100, p = 500
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
LASSO Post-LASSO
Bias
Post-Lasso often outperforms Lasso due to removal of shrinkage bias.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Part II.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
3. Estimation and Inference with Many Instruments
Focus discussion on a simple IV model:
yi = di α + ǫi ,
di = g(zi ) + vi , (first stage)
ǫi
vi
| zi ∼ 0,
σ2
ǫ σǫv
σǫv σ2
v
◮ can have additional controls wi entering both equations –
assume these have been partialled out; also can have multiple
endogenous variables; see references for details
◮ the main target is α, and g is the unspecified regression function
= “optimal instrument”
◮ We have either
◮ Many instruments. xi = zi , or
◮ Many technical insturments. xi = P(zi ), e.g. polynomials,
trigonometric terms.
◮ where the number of instruments
p is large, possibly much larger than n
.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
3. Inference in the Instrumental Variable Model
◮ Assume approximate sparsity:
g(zi ) = E[di |zi ] = x′
i β0
sparse approximation
+ ri
approx error
that is, optimal instrument is approximated by s (unknown)
instruments, such that
s := β0 0 ≪ n,
1
n
n
i=1
r2
i ≤ σv
s
n
◮ We shall find these ”effective” instruments amongst xi by Lasso,
and estimate the optimal instrument by Post-Lasso,
ˆg(zi ) = x′
i
ˆβPL.
◮ Estimate α using the estimated optimal instrument via 2SLS.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Example 2: Instrument Selection in Angrist-Krueger
Data
◮ yi = wage
◮ di = education (endogenous)
◮ α = returns to schooling
◮ zi = quarter of birth and controls (50 state of birth dummies and 7
year of birth dummies)
◮ xi = P(zi ), includes zi and all interactions
◮ a very large list, p = 1530
Using few instruments (3 quarters of birth) or many instruments
(1530) gives big standard errors. So it seems a good idea to use
instrument selection to see if can improve.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
AK Example
Estimator Instruments Schooling Coef Rob Std Error
2SLS (3 IVs) 3 .10 .020
2SLS (All IVs) 1530 .10 .042
2SLS (LASSO IVs) 12 .10 .014
Notes:
◮ About 12 constructed instruments contain nearly all information.
◮ Fuller’s form of 2SLS is used due to robustness.
◮ The Lasso selection of instruments and standard errors are fully
justified theoretically below
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
2SLS with Post-LASSO estimated Optimal IV
2SLS with Post-LASSO estimated Optimal IV
◮ In step one, estimate optimal instrument g(zi ) = x′
i β using
Post-LASSO estimator.
◮ In step two, compute the 2SLS using optimal instrument as IV,
α = [
1
n
n
i=1
[di g(zi )′
]]−1 1
n
n
i=1
[g(zi )yi ]
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
IV Selection: Theoretical Justification
Theorem (Result 3: 2SLS with LASSO-selected IV)
Under practical regularity conditions, if the optimal instrument is
sufficient sparse, namely s2
log2
p = o(n), and is strong, namely
|E[di g(zi )]| is bounded away from zero, we have that
σ−1
n
√
n(α − α) →d N(0, 1),
where σ2
n is the standard White’s robust formula for the variance of
2SLS. The estimator is semi-parametrically efficient under
homoscedasticity.
◮ Ref: Belloni, Chen, Chernozhukov, and Hansen (Econometrica, 2012)
for a general statement.
◮ A weak-instrument robust procedure is also available – the sup-score
test; see Ref.
◮ Key point: “Selection mistakes” are asymptotically negligible due to
”low-bias” property of the estimating equations, which we shall discuss
later.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
IV Selection: Monte Carlo Justification
A representative example: Everything Gaussian, with
di =
100
j=1
xij · µj
+ vi , |µ| < 1
This is an approximately sparse model where most of information is
contained in a few instruments.
Case 1. p = 100 < n = 250, first stage E[F] = 40
Estimator RMSE 5% Rej Prob
(Fuller’s) 2SLS ( All IVs) 0.13 5.6%
2SLS (LASSO IVs) 0.08 6%
Remark. Fuller’s 2SLS is a consistent under many instruments, and is a state of the art
method.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
IV Selection: Monte Carlo Justification
A representative example: Everything Gaussian, with
di =
100
j=1
xij · µj
+ vi , |µ| < 1
This is an approximately sparse model where most of information is
contained in a few instruments.
Case 2. p = 100 = n = 100, first stage E[F] = 40
Estimator RMSE 5% Rej Prob
(Fuller’s) 2SLS (Alls IVs) 5.05 8%
2SLS (LASSO IVs) 0.13 6%
◮ Conclusion. Performance of the new method is quite reassuring.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
4. Estimation & Inference on Treatment Effects in a
Partially Linear Model
Example 3: (Exogenous) Cross-Country Growth Regression.
◮ Relation between growth rate and initial per capita GDP,
conditional on covariates, describing institutions and
technological factors:
GrowthRate
yi
= β0 + α
ATE
log(GDP)
di
+
p
j=1
βj xij + ǫi
where the model is exogenous,
E[ǫi|di , xi] = 0.
◮ Test the convergence hypothesis – α < 0 – poor countries catch
up with richer countries, conditional on similar institutions etc.
Prediction from the classical Solow growth model.
◮ In Barro-Lee data, we have p = 60 covariates, n = 90
observations. Need to do selection.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
How to perform selection?
◮ (Don’t do it!) Naive/Textbook selection
1. Drop all x′
ij s that have small coefficients, using model selection
devices (classical such as t-tests or modern)
2. Run OLS of yi on di and selected regressors.
Does not work because fails to control omitted variable bias.
(Leeb and P¨otscher, 2009).
◮ We propose Double Selection approach:
1. Select controls xij ’s that predict yi .
2. Select controls xij ’s that predict di .
3. Run OLS of yi on di and the union of controls selected in steps 1
and 2.
◮ The additional selection step controls the omitted variable bias.
◮ We find that the coefficient on lagged GDP is negative, and the
confidence intervals exclude zero.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Real GDP per capita (log)
Method Effect Std. Err.
Barro-Lee (Economic Reasoning) −0.02 0.005
All Controls (n = 90, p = 60) −0.02 0.031
Post-Naive Selection −0.01 0.004
Post-Double-Selection −0.03 0.011
◮ Double-Selection finds 8 controls, including trade-openness and
several education variables.
◮ Our findings support the conclusions reached in Barro and Lee
and Barro and Sala-i-Martin.
◮ Using all controls is very imprecise.
◮ Using naive selection gives a biased estimate for the speed of
convergence.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
TE in a Partially Linear Model
Partially linear regression model (exogenous)
yi = di α0 + g(zi ) + ζi , E[ζi | zi , di ] = 0,
◮ yi is the outcome variable
◮ di is the policy/treatment variable whose impact is α0
◮ zi represents confounding factors on which we need to condition
For us the auxilliary equation will be important:
di = m(zi ) + vi, E[vi | zi] = 0,
◮ m summarizes the counfounding effect and creates omitted
variable biases.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
TE in a Partially Linear Model
Use many control terms xi = P(zi ) ∈ IRp
to approximate g and m
yi = di α0 + x′
i βg0 + rgi
g(zi )
+ζi , di = x′
i βm0 + rmi
m(zi )
+vi
◮ Many controls. xi = zi.
◮ Many technical controls. xi = P(zi ), e.g. polynomials,
trigonometric terms.
Key assumption: g and m are approximately sparse
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The Inference Problem and Caveats
yi = di α0 + x′
i βg0 + ri + ζi , E[ζi | zi , di ] = 0,
Naive/Textbook Inference:
1. Select controls terms by running Lasso (or variants) of yi on di
and xi
2. Estimate α0 by least squares of yi on di and selected controls,
apply standard inference
However, this naive approach has caveats:
◮ Relies on perfect model selection and exact sparsity. Extremely
unrealistic.
◮ Easily and badly breaks down both theoretically (Leeb and
P¨otscher, 2009) and practically.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo
◮ In this simulation we used: p = 200, n = 100, α0 = .5
yi = di α0 + x′
i (cy θ0) + ζi , ζi ∼ N(0, 1)
di = x′
i (cdθ0) + vi, vi ∼ N(0, 1)
◮ approximately sparse model:
θ0j = 1/j2
◮ let cy and cd vary to vary R2
in each equation
◮ regressors are correlated Gaussians:
x ∼ N(0, Σ), Σkj = (0.5)|j−k|
.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Distribution of Naive Post Selection Estimator
R2
d = .5 and R2
y = .5
−8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8
0
=⇒ badly biased, misleading confidence intervals;
predicted by theorems in Leeb and P¨otscher (2009)
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Inference Quality After Model Selection
Look at the rejection probabilities of a true hypothesis.
yi = di α0 + x′
i (
=⇒ R2
y
cy θ0) + ζi
di = x′
i ( cd
=⇒ R2
d
θ0) + vi
Ideal Rejection Rate
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Inference Quality of Naive Selection
Look at the rejection probabilities of a true hypothesis.
Naive/Textbook Selection
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
Ideal
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
actual rejection probability (LEFT) is far off the nominal rejection probability (RIGHT)
consistent with results of Leeb and P¨otscher (2009)
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Our Proposal: Post Double Selection Method
To define the method, write the reduced form (substitute out di )
yi = x′
i
¯β0 + ¯ri + ¯ζi ,
di = x′
i βm0 + rmi + vi ,
1. (Direct) Let ˆI1 be controls selected by Lasso of yi on xi.
2. (Indirect) Let ˆI2 be controls selected by Lasso of di on xi .
3. (Final) Run least squares of yi on di and union of selected controls:
(ˇα, ˇβ) = argmin
α∈R,β∈Rp
{
1
n
n
i=1
[(yi − di α − x′
i β)2
] : βj = 0, ∀j ∈ ˆI = ˆI1 ∪ˆI2}.
The post-double-selection estimator.
◮ Belloni, Chernozhukov, Hansen (World Congress, 2010).
◮ Belloni, Chernozhukov, Hansen (ReStud, 2011, R &R)
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Distributions of Post Double Selection Estimator
R2
d = .5 and R2
y = .5
−8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8
0
=⇒ low bias, accurate confidence intervals
Belloni, Chernozhukov, Hansen (2011)
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Inference Quality After Model Selection
Double Selection
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
Ideal
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
the left plot is rejection frequency of the t-test based on the post-double-selection
estimator
Belloni, Chernozhukov, Hansen (2011)
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Intuition
◮ The double selection method is robust to moderate selection
mistakes.
◮ The Indirect Lasso step — the selection among the controls xi
that predict di – creates this robustness. It finds controls whose
omission would lead to a ”large” omitted variable bias, and
includes them in the regression.
◮ In essence the procedure is a selection version of Frisch-Waugh
procedure for estimating linear regression.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
More Intuition
Think about omitted variables bias in case with one treatment (d) and one
regressor (x):
yi = αdi + βxi + ζi ; di = γxi + vi
If we drop xi , the short regression of yi on di gives
√
n(α − α) = good term +
√
n (D′
D/n)−1
(X′
X/n)(γβ)
OMVB
.
◮ the good term is asymptotically normal, and we want
√
nγβ → 0.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
More Intuition
Think about omitted variables bias in case with one treatment (d) and one
regressor (x):
yi = αdi + βxi + ζi ; di = γxi + vi
If we drop xi , the short regression of yi on di gives
√
n(α − α) = good term +
√
n (D′
D/n)−1
(X′
X/n)(γβ)
OMVB
.
◮ the good term is asymptotically normal, and we want
√
nγβ → 0.
◮ naive selection drops xi if β = O( log n/n), but
√
nγ log n/n → ∞
◮ double selection drops xi only if both β = O( log n/n) and
γ = O( log n/n), that is, if
√
nγβ = O((log n)/
√
n) → 0.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Main Result
Theorem (Result 4: Inference on a Coefficient in
Regression)
Uniformly within a rich class of models, in which g and m admit a
sparse approximation with s2
log2
(p ∨ n)/n → 0 and other practical
conditions holding,
σ−1
n
√
n(ˇα − α0) →d N(0, 1),
where σ2
n is Robinoson’s formula for variance of LS in a partially linear
model. Under homoscedasticity, semi-parametrically efficient.
◮ Model selection mistakes are asymptotically negligible due to
double selection.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Bonus Track: Generalizations.∗
◮ The double selection (DS) procedure implicitly identifies α0 implicitly off
the moment condition:
E[Mi (α0, g0, m0)] = 0,
where
Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi ))
where g0 and m0 are (implicitly) estimated by the post-selection
estimators.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Bonus Track: Generalizations.∗
◮ The double selection (DS) procedure implicitly identifies α0 implicitly off
the moment condition:
E[Mi (α0, g0, m0)] = 0,
where
Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi ))
where g0 and m0 are (implicitly) estimated by the post-selection
estimators.
◮ The DS procedure works because Mi is ”immunized” against
perturbations in g0 and m0:
∂
∂g
E[Mi (α0, g, m0)]|g=g0 = 0,
∂
∂m
E[Mi (α0, g0, m)]|m=m0 = 0.
◮ Moderate selection errors translate into moderate estimation errors,
which have asymptotically negligible effects on large sample distribution
of estimators of α0 based on the sample analogs of equations above.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Can this be generalized? Yes. Generally want to create moment
equations such that target parameter α0 is identified via moment
condition:
E[Mi (α0, h0)] = 0,
where α0 is the main parameter, and h0 is a nuisance function (e.g.
h0 = (g0, m0)), with Mi ”immunized” against perturbations in h0:
∂
∂h
E[Mi (α0, h)]|h=h0
= 0
◮ This property allows for ”non-regular” estimation of h, via
post-selection or other regularization methods, with rates that are
slower than 1/
√
n.
◮ It allows for moderate selection mistakes in estimation.
◮ In absence of the immunization property, the post-selection
inference breaks down.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Bonus Track: Generalizations.∗
Examples in this Framework:
1. IV model
Mi (α, g) = (yi − di α)g(zi )
has immunization property (since E[(yi − di α0)˜g(zi )] = 0 for any ˜g), and this
=⇒ validity of inference after selection-based estimation of g)
2. Partially linear model
Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi ))
has immunization property, which =⇒ validity of post-selection inference,
where we do double selection – controls that explain g and m.
3. Logistic, Quantile regression
◮ Belloni, Chernozhukov, Kato (2013, ArXiv)
◮ Belloni, Chernozhukov, Ying (2013, ArXiv)
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
5. Heterogeneous Treatment Effects∗
◮ Here di is binary, indicating the receipt of the treatment,
◮ Drop partially linear structure; instead assume di is fully
interacted with all other control variables:
yi = di g(1, zi ) + (1 − di )g(0, zi )
g(di ,zi )
+ζi , E[ζi | di , zi] = 0
di = m(zi) + ui , E[ui |zi ] = 0 (as before)
◮ Target parameter. Average Treatment Effect:
α0 = E[g(1, zi ) − g(0, zi )].
◮ Example. di = 401(k) eligibility, zi = characteristics of the
worker/firm, yi= net savings or total wealth, α0 = the average
impact of 401(k) eligibility on savings.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
5. Heterogeneous Treatment Effects ∗
An appropriate Mi is given by Hahn’s (1998) efficient score
Mi (α, g, m) =
di (yi − g(1, zi ))
m(zi )
−
(1 − di )(yi − g(0, zi ))
1 − m(zi )
+ g(1, zi ) − g(0, zi ) − α.
which is ”immunized” against perturbations in g0 and m0:
∂
∂g
E[Mi (α0, g, m0)]|g=g0
= 0,
∂
∂m
E[Mi (α0, g0, m)]|m=m0
= 0.
Hence the post-double selection estimator for α is given by
ˇα =
1
N
N
i=1
di (yi − g(1, zi ))
ˆm(zi )
−
(1 − di )(yi − g(0, zi ))
1 − m(zi )
+ g(1, zi ) − g(0, zi ) ,
where we estimate g and m via post- selection (Post-Lasso)
methods.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Theorem (Result 5: Inference on ATE)
Uniformly within a rich class of models, in which g and m admit a
sparse approximation with s2
log2
(p ∨ n)/n → 0 and other practical
conditions holding,
σ−1
n
√
n(ˇα − α0) →d N(0, 1),
where σ2
n = E[M2
i (α0, g0, m0)].
Moreover, ˇα is semi-parametrically efficient for α0.
◮ Model selection mistakes are asymptotically negligible due to the
use of ”immunizing” moment equations.
◮ Ref. Belloni, Chernozhukov, Hansen “Inference on TE after selection amongst
high-dimensional controls” (2013 version, available via CEMMAP).
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Conclusion
◮ Approximately sparse model
◮ Corresponds to the usual ”parsimonious” approach, but
specification searches are put on rigorous footing
◮ Useful for predicting regression functions
◮ Useful for selection of instruments
◮ Useful for selection of controls, but avoid naive/textbook
selection
◮ Use double selection that protects against omitted variable bias
◮ Use “immunized” moment equations more generally
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
References
◮ Bickel, P., Y. Ritov and A. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector”, Annals of Statistics, 2009.
◮ Candes E. and T. Tao, “The Dantzig selector: statistical estimation when p is much larger than n,” Annals of Statistics, 2007.
◮ Donald S. and W. Newey, “Series estimation of semilinear models,” Journal of Multivariate Analysis, 1994.
◮ Tibshirani, R, “Regression shrinkage and selection via the Lasso,” J. Roy. Statist. Soc. Ser. B, 1996.
◮ Frank, I. E., J. H. Friedman (1993): “A Statistical View of Some Chemometrics Regression Tools,”Technometrics, 35(2), 109–135.
◮ Gautier, E., A. Tsybakov (2011): “High-dimensional Instrumental Variables Rergession and Confidence Sets,” arXiv:1105.2454v2
◮ Hahn, J. (1998): “On the role of the propensity score in efficient semiparametric estimation of average treatment effects,”
Econometrica, pp. 315–331.
◮ Heckman, J., R. LaLonde, J. Smith (1999): “The economics and econometrics of active labor market programs,” Handbook of
labor economics, 3, 1865–2097.
◮ Imbens, G. W. (2004): “Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review,” The Review of
Economics and Statistics, 86(1), 4–29.
◮ Leeb, H., and B. M. P¨otscher (2008): “Can one estimate the unconditional distribution of post-model-selection estimators?,”
Econometric Theory, 24(2), 338–376.
◮ Robinson, P. M. (1988): “Root-N-consistent semiparametric regression,” Econometrica, 56(4), 931–954.
◮ Rudelson, M., R. Vershynin (2008): “On sparse reconstruction from Foruier and Gaussian Measurements”, Comm Pure Appl
Math, 61, 1024-1045.
◮ Jing, B.-Y., Q.-M. Shao, Q. Wang (2003): “Self-normalized Cramer-type large deviations for independent random variables,” Ann.
Probab., 31(4), 2167–2215.
VC and CH Econometrics of High-Dimensional Sparse Models
Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
All references below are downloadable via www.arXiv.org:
◮ Belloni, A., V. Chernozhukov, and C. Hansen (2010) “Inference for High-Dimensional Sparse Econometric Models,” Advances in
Economics and Econometrics. 10th World Congress of Econometric Society, Shanghai, 2010. (ArXiv, 2011).
◮ Belloni, Chernozhukov, Hansen (2011) “Inference on TE after selection amongst high-dimensional controls”, CEMMAP working
paper (revised June, 2013). R&R Review of Economic Studies. (ArXiv, 2011)
◮ Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse Models and Methods for Optimal Instruments with an
Application to Eminent Domain,” Econometrica, 80, 2369–2429. (ArXiv, 2010)
◮ Belloni, A., and V. Chernozhukov (2011a): “ℓ1-penalized quantile regression in high-dimensional sparse models,” Ann. Statist.,
39(1), 82–130. (ArXiv, 2009)
◮ Belloni, A., and V. Chernozhukov (2013): “Least Squares After Model Selection in High-dimensional Sparse Models,” Bernoulli,
19(2), 521–547. (ArXiv, 2009)
◮ Belloni, A., V. Chernozhukov, L. Wang (2011): “Square-Root-LASSO: Pivotal Recovery of Sparse Signals via Conic Programming,”
Biometrika, 98(4), 791–806. (ArXiv, 2010).
◮ Belloni, A., V. Chernozhukov, K. Kato (2013): “Uniform Post Selection Inference for LAD Regression Models,” arXiv preprint
arXiv:1304.0282. (ArXiv, 2013)
◮ Belloni, A., V. Chernozhukov, L. Wang (2011): “Square-Root-LASSO: Pivotal Recovery of Nonparametric Regression Functions
via Conic Programming,” (ArXiv, 2011)
◮ Belloni, A., V. Chernozhukov, Y. Wei (2013): “Honest Confidence Regions for Logistic Regression with a Large Number of
Controls,” arXiv preprint arXiv:1304.3969 (ArXiv, 2013).
VC and CH Econometrics of High-Dimensional Sparse Models

Mais conteúdo relacionado

Mais procurados

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Rupak Roy
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)Abhimanyu Dwivedi
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionRupak Roy
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation SpecializationAndrea Rubio
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control StudySatish Gupta
 
Principal components
Principal componentsPrincipal components
Principal componentsHutami Endang
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...
CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...
CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...Gilbert Joseph Abueg
 
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACROBOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACROAnthony Kilili
 

Mais procurados (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
Sampling theory
Sampling theorySampling theory
Sampling theory
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
Principal components
Principal componentsPrincipal components
Principal components
 
WSDM2019tutorial
WSDM2019tutorialWSDM2019tutorial
WSDM2019tutorial
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...
CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...
CABT SHS Statistics & Probability - Mean and Variance of Sampling Distributio...
 
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACROBOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
 
ecir2019tutorial
ecir2019tutorialecir2019tutorial
ecir2019tutorial
 

Semelhante a Econometrics of High-Dimensional Sparse Models

Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithminventionjournals
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Mengxi Jiang
 
Chapter III.pptx
Chapter III.pptxChapter III.pptx
Chapter III.pptxBeamlak5
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelMehdi Shayegani
 
Bel ventutorial hetero
Bel ventutorial heteroBel ventutorial hetero
Bel ventutorial heteroEdda Kang
 
Degree presentation: Indirect Inference Applied to Financial Econometrics
Degree presentation: Indirect Inference Applied to Financial EconometricsDegree presentation: Indirect Inference Applied to Financial Econometrics
Degree presentation: Indirect Inference Applied to Financial EconometricsJean Duchesne
 
Linear Regression With One or More Variables
Linear Regression With One or More VariablesLinear Regression With One or More Variables
Linear Regression With One or More Variablestadeuferreirajr
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptxiris765749
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperSomashekar S.M
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 

Semelhante a Econometrics of High-Dimensional Sparse Models (20)

Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
 
report
reportreport
report
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
Chapter III.pptx
Chapter III.pptxChapter III.pptx
Chapter III.pptx
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression model
 
Bel ventutorial hetero
Bel ventutorial heteroBel ventutorial hetero
Bel ventutorial hetero
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression
RegressionRegression
Regression
 
Degree presentation: Indirect Inference Applied to Financial Econometrics
Degree presentation: Indirect Inference Applied to Financial EconometricsDegree presentation: Indirect Inference Applied to Financial Econometrics
Degree presentation: Indirect Inference Applied to Financial Econometrics
 
Linear Regression With One or More Variables
Linear Regression With One or More VariablesLinear Regression With One or More Variables
Linear Regression With One or More Variables
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
1624.pptx
1624.pptx1624.pptx
1624.pptx
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 

Mais de NBER

FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESFISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESNBER
 
Business in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayBusiness in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayNBER
 
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...NBER
 
The Distributional E ffects of U.S. Clean Energy Tax Credits
The Distributional Effects of U.S. Clean Energy Tax CreditsThe Distributional Effects of U.S. Clean Energy Tax Credits
The Distributional E ffects of U.S. Clean Energy Tax CreditsNBER
 
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...NBER
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1NBER
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2NBER
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and TextNBER
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3NBER
 
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1NBER
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2NBER
 
Acemoglu lecture4
Acemoglu lecture4Acemoglu lecture4
Acemoglu lecture4NBER
 
The NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua GansThe NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua GansNBER
 
The NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia GoldinThe NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia GoldinNBER
 
The NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James PoterbaThe NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James PoterbaNBER
 
The NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott SternThe NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott SternNBER
 
The NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn EllisonThe NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn EllisonNBER
 
L3 1b
L3 1bL3 1b
L3 1bNBER
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 

Mais de NBER (20)

FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESFISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
 
Business in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayBusiness in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They Pay
 
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
 
The Distributional E ffects of U.S. Clean Energy Tax Credits
The Distributional Effects of U.S. Clean Energy Tax CreditsThe Distributional Effects of U.S. Clean Energy Tax Credits
The Distributional E ffects of U.S. Clean Energy Tax Credits
 
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3
 
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
 
Acemoglu lecture4
Acemoglu lecture4Acemoglu lecture4
Acemoglu lecture4
 
The NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua GansThe NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua Gans
 
The NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia GoldinThe NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia Goldin
 
The NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James PoterbaThe NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James Poterba
 
The NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott SternThe NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott Stern
 
The NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn EllisonThe NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn Ellison
 
L3 1b
L3 1bL3 1b
L3 1b
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 

Último

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Econometrics of High-Dimensional Sparse Models

  • 1. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Econometrics of High-Dimensional Sparse Models (p much larger than n) Victor Chernozhukov Christian Hansen NBER, July 2013 VC and CH Econometrics of High-Dimensional Sparse Models
  • 2. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 3. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline for Econometric Theory of “Big Data” Part I. 1. High-Dimensional Sparse Models (HDSM) ◮ Models ◮ Motivating Examples 2. Estimation of Regression Functions via Penalization and Selection Methods ◮ ℓ1-penalization or LASSO methods ◮ post-selection estimators or Post-Lasso methods Part II. 3. Estimation and Inference in IV regression with Many Instruments 4. Estimation and Inference on Treatment Effects with Many Controls in a Partially Linear Model. 5. Generalizations. VC and CH Econometrics of High-Dimensional Sparse Models
  • 4. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Materials 1. A semi-review article: ◮ Belloni, Chernozhukov, and Hansen, ”Inference in High-Dimensional Sparse Econometric Models”, 2010, Advances in Economics and Econometrics, 10th World Congress. http://arxiv.org/pdf/1201.0220v1.pdf 2. Research Articles Listed in References. 3. Stata and or Matlab codes are available for most empirical examples via links to be posted at www.mit.edu/˜vchern/. VC and CH Econometrics of High-Dimensional Sparse Models
  • 5. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Part I. VC and CH Econometrics of High-Dimensional Sparse Models
  • 6. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 7. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples 1. High-Dimensional Sparse Econometric Model HDSM. A response variable yi obeys yi = x′ i β0 + ǫi , ǫi ∼ (0, σ2 ), i = 1, ..., n where xi are p-dimensional; w.l.o.g. we normalize each regressor: xi = (xij , j = 1, ..., p)′ , 1 n n i=1 x2 ij = 1. p possibly much larger than n. VC and CH Econometrics of High-Dimensional Sparse Models
  • 8. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples 1. High-Dimensional Sparse Econometric Model HDSM. A response variable yi obeys yi = x′ i β0 + ǫi , ǫi ∼ (0, σ2 ), i = 1, ..., n where xi are p-dimensional; w.l.o.g. we normalize each regressor: xi = (xij , j = 1, ..., p)′ , 1 n n i=1 x2 ij = 1. p possibly much larger than n. The key assumption is sparsity, the number of relevant regressors is much smaller than the sample size: s := β0 0 = p j=1 1{β0j = 0} ≪ n, This generalizes the traditional parametric framework used in empirical economics, by allowing the identity T = {j ∈ {1, ..., p} : β0j = 0} of the relevant s regressors be unknown. VC and CH Econometrics of High-Dimensional Sparse Models
  • 9. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples Motivation for high p ◮ transformations of basic regressors zi, xi = (P1(zi ), ..., Pp(zi ))′ , ◮ for example, in wage regressions, Pj s are polynomials or B-splines in education and experience. ◮ and/or simply a very large list of regressors, ◮ a list of country characteristics in cross-country growth regressions (Barro & Lee), ◮ housing characteristics in hedonic regressions (American Housing Survey) ◮ price and product characteristics at the point of purchase (scanner data, TNS). ◮ judge characteristics in the analysis of economic impacts of the eminent domain VC and CH Econometrics of High-Dimensional Sparse Models
  • 10. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples From Sparsity to Approximate Sparsity ◮ The key assumption is that the number of non-zero regression coefficients is smaller than the sample size: s := β0 0 = p j=1 1{β0j = 0} ≪ n. ◮ The idea is that a low-dimensional (s-dimensional) submodel accurately approximates the full p-dimensional model. The approximation error is in fact zero. ◮ The approximately sparse model allows for a non-zero approximation error yi = x′ i β0 + ri regression function +ǫi , that is not bigger than the size of estimation error, namely as n → ∞ s log p n → 0, 1 n n i=1 r2 i σ s n → 0. VC and CH Econometrics of High-Dimensional Sparse Models
  • 11. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples ◮ Example: yi = ∞ j=1 θj xj + ǫi , |θj | j−a , a > 1/2, has s = σn1/2a , because we need only s regressors with largest coefficients to have 1 n n i=1 r2 i σ s n . ◮ The approximately sparse model generalizes the exact sparse model, by letting in approximation error. ◮ This model also generalizes the traditional series/sieve regression model by letting the identity T = {j ∈ {1, ..., p} : β0j = 0} of the most important s series terms be unknown. ◮ All results we present are for the approximately sparse model. VC and CH Econometrics of High-Dimensional Sparse Models
  • 12. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples Example 1: Series Models of Wage Function ◮ In this example, abstract away from the estimation questions, using population/census data. In order to visualize the idea of the approximate sparsity, consider a contrived example. ◮ Consider a series expansion of the conditional expectation E[yi |zi ] of wage yi given education zi . ◮ A conventional series approximation to the regression function is, for example, E[yi |zi ] = β1 + β2P1(zi ) + β3P2(zi ) + β4P3(zi ) + ri where P1, ..., P3 are low-order polynomials (or other terms). VC and CH Econometrics of High-Dimensional Sparse Models
  • 13. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples 8 10 12 14 16 18 20 6.06.57.0 education wage Traditional Approximation of Expected Wage Function using Polynomials ◮ In the figure, true regression function E[yi|zi ] computed using U.S. Census data, year 2000, prime-age white men. VC and CH Econometrics of High-Dimensional Sparse Models
  • 14. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples ◮ Can we a find a much better series approximation, with the same number of parameters? ◮ Yes, if we can capture the oscillatory behavior of E[yi|zi ] in some regions. ◮ We consider a “very long” expansion E[yi |zi] = p j=1 β0j Pj (zi) + r′ i , with polynomials and dummy variables, and shop around just for a few terms that capture “oscillations”. ◮ We do this using the LASSO – which finds a parsimonious model by minimizing squared errors, while penalizing the size of the model through by the sum of absolute values of coefficients. In this example we can also find the “right” terms by “eye-balling”. VC and CH Econometrics of High-Dimensional Sparse Models
  • 15. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples 8 10 12 14 16 18 20 6.06.57.0 education wage Lasso Approximation of Expected Wage Function using Polynomials and Dummies VC and CH Econometrics of High-Dimensional Sparse Models
  • 16. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples 8 10 12 14 16 18 20 6.06.57.0 education wage Traditional vs Lasso Approximation of Expected Wage Functions with Equal Number of Parameters VC and CH Econometrics of High-Dimensional Sparse Models
  • 17. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples Errors of Traditional and Lasso-Based Sparse Approximations RMSE Max Error Conventional Series Approximation 0.135 0.290 Lasso-Based Series Approximation 0.031 0.063 Notes. 1. Conventional approximation relies on low order polynomial with 4 parameters. 2. Sparse approximation relies on a combination of polynomials and dummy variables and also has 4 parameters. Conclusion. Examples show how the new framework nests and expands the traditional parsimonious modelling framework used in empirical economics. VC and CH Econometrics of High-Dimensional Sparse Models
  • 18. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 19. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 2. Estimation of Regression Functions via L1-Penalization and Selection ◮ When p is large, good idea to do selection or penalization to prevent overfitting. Ideally, would like to try to minimize a BIC type criterion function 1 n n i=1 [yi − x′ i β]2 + λ β 0, β 0 = p j=1 1{β0j = 0} but this is not computationally feasible – NP hard. ◮ A solution (Frank and Friedman, 94, Tibshirani, 96) is to replace the ℓ0 ”norm” by a closest convex function – the ℓ1-norm. LASSO estimator β then minimizes 1 n n i=1 [yi − x′ i β]2 + λ β 1, β 1 = p j=1 |βj |. Globally convex, computable in polynomial time. Kink in the penalty induces the solution ˆβ to have lots of zeroes, so often used as a model selection device. VC and CH Econometrics of High-Dimensional Sparse Models
  • 20. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi The LASSO ◮ The rate-optimal choice of penalty level is λ =σ · 2 2 log(pn)/n. (Bickel, Ritov, Tsybakov, Annals of Statistics, 2009). ◮ The choice relies on knowing σ, which may be apriori hard to estimate when p ≫ n. VC and CH Econometrics of High-Dimensional Sparse Models
  • 21. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi The LASSO ◮ The rate-optimal choice of penalty level is λ =σ · 2 2 log(pn)/n. (Bickel, Ritov, Tsybakov, Annals of Statistics, 2009). ◮ The choice relies on knowing σ, which may be apriori hard to estimate when p ≫ n. ◮ Can estimate σ by iterating from a conservative starting value (standard deviation around the sample mean) , see Belloni and Chernozhukov (2009, Bernoulli). Very simple. ◮ Cross-validation is often used as well and performs well in Monte-Carlo, but its theoretical validity is an open question in the settings p ≫ n. VC and CH Econometrics of High-Dimensional Sparse Models
  • 22. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi The √ LASSO ◮ A way around is the √ LASSO estimator minimizing (Belloni, Chernozhukov, Wang, 2010, Biometrika) 1 n n i=1 [yi − x′ i β]2 + λ β 1, ◮ The rate-optimal penalty level is pivotal – independent of σ: λ = 2 log(pn)/n. ◮ Tuning-Free. Globally convex, polynomial time computable via conic programming. VC and CH Econometrics of High-Dimensional Sparse Models
  • 23. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics via Convex Geometry A simple case: yi = x′ i β0 =0 +ǫi −5 −4 −3 −2 −1 0 1 2 3 4 5 β0 = 0 Q(β) ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for LASSO ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for √ LASSO VC and CH Econometrics of High-Dimensional Sparse Models
  • 24. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics via Convex Geometry A simple case: yi = x′ i β0 =0 +ǫi −5 −4 −3 −2 −1 0 1 2 3 4 5 β0 = 0 Q(β) Q(β0) + ∇Q(β0)′ β ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for LASSO ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for √ LASSO VC and CH Econometrics of High-Dimensional Sparse Models
  • 25. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics via Convex Geometry A simple case: yi = x′ i β0 =0 +ǫi −5 −4 −3 −2 −1 0 1 2 3 4 5 β0 = 0 Q(β) λ β 1 λ = ∇Q(β0) ∞ ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for LASSO ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for √ LASSO VC and CH Econometrics of High-Dimensional Sparse Models
  • 26. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics via Convex Geometry A simple case: yi = x′ i β0 =0 +ǫi −5 −4 −3 −2 −1 0 1 2 3 4 5 β0 = 0 Q(β) λ β 1 λ > ∇Q(β0) ∞ ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for LASSO ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for √ LASSO VC and CH Econometrics of High-Dimensional Sparse Models
  • 27. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics via Convex Geometry A simple case: yi = x′ i β0 =0 +ǫi −5 −4 −3 −2 −1 0 1 2 3 4 5 β0 = 0 Q(β) λ > ∇Q(β0) ∞ Q(β) + λ β 1 ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for LASSO ◮ Q(β) = 1 n n i=1[yi − x′ i β]2 for √ LASSO VC and CH Econometrics of High-Dimensional Sparse Models
  • 28. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics ◮ LASSO (and variants) will successfully “zero out” lots of irrelevant regressors, but it won’t be perfect, (no procedure can distinguish β0j = C/ √ n from 0, and so model selection mistakes are bound to happen). ◮ λ is chosen to dominate the norm of the subgradient: P(λ > ∇Q(β0) ∞) → 1, and the choices of λ mentioned precisely implement that. VC and CH Econometrics of High-Dimensional Sparse Models
  • 29. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Heuristics ◮ LASSO (and variants) will successfully “zero out” lots of irrelevant regressors, but it won’t be perfect, (no procedure can distinguish β0j = C/ √ n from 0, and so model selection mistakes are bound to happen). ◮ λ is chosen to dominate the norm of the subgradient: P(λ > ∇Q(β0) ∞) → 1, and the choices of λ mentioned precisely implement that. ◮ In the case of √ LASSO, ∇Q(β0) ∞ = max 1≤j≤p |1 n n i=1 ǫixij | 1 n n i=1 ǫ2 i does not depend on σ. ◮ Hence for √ LASSO λ does not depend on σ. VC and CH Econometrics of High-Dimensional Sparse Models
  • 30. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Dealing with Heteroscedasticity∗ Heteroscedastic Model: yi = x′ i β0 + ri + ǫi , ǫi ∼ (0, σ2 i ). ◮ Heteroscedastic forms of Lasso – Belloni, Chen, Chernozhukov, Hansen (Econometrica, 2012). Fully data-driven. β ∈ arg min β∈Rp 1 n n i=1 [yi − x′ i β]2 + λ Ψβ 1, λ = 2 2 log(pn)/n Ψ = diag[(n−1 n i=1 [x2 ij ǫ2 i ])1/2 + op(1), j = 1, ..., p] VC and CH Econometrics of High-Dimensional Sparse Models
  • 31. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Dealing with Heteroscedasticity∗ Heteroscedastic Model: yi = x′ i β0 + ri + ǫi , ǫi ∼ (0, σ2 i ). ◮ Heteroscedastic forms of Lasso – Belloni, Chen, Chernozhukov, Hansen (Econometrica, 2012). Fully data-driven. β ∈ arg min β∈Rp 1 n n i=1 [yi − x′ i β]2 + λ Ψβ 1, λ = 2 2 log(pn)/n Ψ = diag[(n−1 n i=1 [x2 ij ǫ2 i ])1/2 + op(1), j = 1, ..., p] ◮ Penalty loadings Ψ are estimated iteratively: 1. initialize, e.g., ˆǫi = yi − ¯y, Ψ = diag[(n−1 n i=1[x2 ij ˆǫ2 i ])1/2 , j = 1, ..., p] 2. obtain β, update ˆǫi = yi − x′ i ˆβ, Ψ = diag[(n−1 n i=1[x2 ij ˆǫ2 i ])1/2 , j = 1, ..., p] 3. iterate on the previous step. ◮ For Heteroscedastic forms of √ LASSO, see Belloni, Chernozhukov, Wang (Annals of Statistics, R & R, 2011). VC and CH Econometrics of High-Dimensional Sparse Models
  • 32. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Probabilistic intuition for the latter construction ∗ Construction makes the ”noise” in Kuhn-Tucker conditions self-normalized, and λ dominates the ”noise”. Union bounds and the moderate deviation theory for self-normalized sums (Jing, Shao, Wang, Ann. Prob., 2005) imply that: P max 1≤j≤p 2|1 n n i=1[ǫi xij ]| 1 n n i=1 ǫ2 i x2 ij ”max norm of gradient” ≤ λ penalty level = 1 − O(1/n). under the condition that log p = o(n1/3 ) if for all i ≤ n, j ≤ p E[x3 ij ǫ3 i ] ≤ K. VC and CH Econometrics of High-Dimensional Sparse Models
  • 33. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Some properties ◮ Due to kink in the penalty, LASSO (and variants) will successfully “zero out” lots of irrelevant regressors (but don’t expect it to be perfect). ◮ Lasso procedures bias/shrink the non-zero coefficient estimates towards zero. ◮ The latter property motivates the use of Least squares after Lasso, or Post-Lasso. VC and CH Econometrics of High-Dimensional Sparse Models
  • 34. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Post-Model Selection Estimator, or Post-LASSO Define the post-selection, e.g., post-LASSO estimator as follows: 1. In step one, select the model using the LASSO or √ LASSO. 2. In step two, apply ordinary LS to the selected model. VC and CH Econometrics of High-Dimensional Sparse Models
  • 35. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Regularity Condition on X∗ ◮ A simple sufficient condition is as follows. Condition RSE. Take any C > 1. With probability approaching 1, matrix M = 1 n n i=1 xi x′ i , obeys 0 < K ≤ min δ 0≤sC δ′ Mδ δ′δ ≤ max δ 0≤sC δ′ Mδ δ′δ ≤ K′ < ∞. (1) ◮ This holds under i.i.d. sampling if E[xi x′ i ] has eigenvalues bounded away from zero and above, and: – xi has light tails (i.e., log-concave) and s log p = o(n); – or bounded regressors maxij |xij | ≤ K and s(log p)5 = o(n). Ref. Rudelson and Vershynin (2009). VC and CH Econometrics of High-Dimensional Sparse Models
  • 36. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Result 1: Rates for LASSO/ √ LASSO Theorem (Rates) Under practical regularity conditions– including errors having 4 + δ bounded moments and log p = o(n1/3 ) – with probability approaching 1, ˆβ − β0 1 n n i=1 [x′ i ˆβ − x′ i β0]2 σ s log(n ∨ p) n ◮ The rate is close to the “oracle” rate s/n, obtainable when we know the “true” model T; p shows up only through log p. ◮ References. - LASSO — Bickel, Ritov, Tsybakov (Annals of Statistics 2009), Gaussian errors. - heteroscedastic LASSO – Belloni, Chen, Chernozhukov, Hansen (Econometrica 2012), non-Gaussian errors. - √ LASSO – Belloni, Chernozhukov and Wang (Biometrika, 2010), non-Gaussian errors. - heteroscedastic √ LASSO – Belloni, Chernozhukov and Wang (Annals R &R , 2010), non-Gaussian errors. VC and CH Econometrics of High-Dimensional Sparse Models
  • 37. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Result 2: Post-Model Selection Estimator In the rest of the talk LASSO means all of its variants, especially their heteroscedastic versions. Recall that the post-LASSO estimator is defined as follows: 1. In step one, select the model using the LASSO. 2. In step two, apply ordinary LS to the selected model. ◮ Lasso (or any other method) is not perfect at model selection – might include “junk”, exclude some relevant regressors. ◮ Analysis of all post-selection methods in this lecture accounts for imperfect model selection . VC and CH Econometrics of High-Dimensional Sparse Models
  • 38. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Result 2: Post-Selection Estimator Theorem (Rate for Post-Selection Estimator) Under practical conditions, with probability approaching 1, ˆβPL − β0 1 n n i=1 [x′ i ˆβPL − x′ i β0]2 σ s n log(n ∨ p), Under some further exceptional cases faster, up to σ s n . ◮ Even though LASSO does not in general perfectly select the relevant regressors, Post-LASSO performs at least as well. ◮ This result was first derived for least squares by ◮ Belloni and Chernozhukov (Bernoulli, 2009). ◮ Extended to heteroscedastic, non-Gaussian case in ◮ Belloni, Chen, Chernozhukov, Hansen (Econometrica, 2012). VC and CH Econometrics of High-Dimensional Sparse Models
  • 39. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Monte Carlo ◮ In this simulation we used s = 6, p = 500, n = 100 yi = x′ i β0 + ǫi , ǫi ∼ N(0, σ2 ), β0 = (1, 1, 1/2, 1/3, 1/4, 1/5, 0, . . . , 0)′ xi ∼ N(0, Σ), Σij = (1/2)|i−j| , σ2 = 1 ◮ Ideal benchmark: Oracle estimator which runs OLS of yi on xi1, ..., xi6. This estimator is not feasible outside Monte-Carlo. VC and CH Econometrics of High-Dimensional Sparse Models
  • 40. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Monte Carlo Results: Prediction Error RMSE: [E[x′ i (ˆβ − β0)]2 ]1/2 n = 100, p = 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 LASSO Post-LASSO Oracle Estimation Risk Lasso is not perfect at model selection, but does find good models, allowing Lasso and Post-Lasso to perform at the near-Oracle level. VC and CH Econometrics of High-Dimensional Sparse Models
  • 41. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Monte Carlo Results: Bias Norm of the Bias E ˆβ − β0 n = 100, p = 500 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 LASSO Post-LASSO Bias Post-Lasso often outperforms Lasso due to removal of shrinkage bias. VC and CH Econometrics of High-Dimensional Sparse Models
  • 42. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Part II. VC and CH Econometrics of High-Dimensional Sparse Models
  • 43. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 44. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 3. Estimation and Inference with Many Instruments Focus discussion on a simple IV model: yi = di α + ǫi , di = g(zi ) + vi , (first stage) ǫi vi | zi ∼ 0, σ2 ǫ σǫv σǫv σ2 v ◮ can have additional controls wi entering both equations – assume these have been partialled out; also can have multiple endogenous variables; see references for details ◮ the main target is α, and g is the unspecified regression function = “optimal instrument” ◮ We have either ◮ Many instruments. xi = zi , or ◮ Many technical insturments. xi = P(zi ), e.g. polynomials, trigonometric terms. ◮ where the number of instruments p is large, possibly much larger than n . VC and CH Econometrics of High-Dimensional Sparse Models
  • 45. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 3. Inference in the Instrumental Variable Model ◮ Assume approximate sparsity: g(zi ) = E[di |zi ] = x′ i β0 sparse approximation + ri approx error that is, optimal instrument is approximated by s (unknown) instruments, such that s := β0 0 ≪ n, 1 n n i=1 r2 i ≤ σv s n ◮ We shall find these ”effective” instruments amongst xi by Lasso, and estimate the optimal instrument by Post-Lasso, ˆg(zi ) = x′ i ˆβPL. ◮ Estimate α using the estimated optimal instrument via 2SLS. VC and CH Econometrics of High-Dimensional Sparse Models
  • 46. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Example 2: Instrument Selection in Angrist-Krueger Data ◮ yi = wage ◮ di = education (endogenous) ◮ α = returns to schooling ◮ zi = quarter of birth and controls (50 state of birth dummies and 7 year of birth dummies) ◮ xi = P(zi ), includes zi and all interactions ◮ a very large list, p = 1530 Using few instruments (3 quarters of birth) or many instruments (1530) gives big standard errors. So it seems a good idea to use instrument selection to see if can improve. VC and CH Econometrics of High-Dimensional Sparse Models
  • 47. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi AK Example Estimator Instruments Schooling Coef Rob Std Error 2SLS (3 IVs) 3 .10 .020 2SLS (All IVs) 1530 .10 .042 2SLS (LASSO IVs) 12 .10 .014 Notes: ◮ About 12 constructed instruments contain nearly all information. ◮ Fuller’s form of 2SLS is used due to robustness. ◮ The Lasso selection of instruments and standard errors are fully justified theoretically below VC and CH Econometrics of High-Dimensional Sparse Models
  • 48. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 2SLS with Post-LASSO estimated Optimal IV 2SLS with Post-LASSO estimated Optimal IV ◮ In step one, estimate optimal instrument g(zi ) = x′ i β using Post-LASSO estimator. ◮ In step two, compute the 2SLS using optimal instrument as IV, α = [ 1 n n i=1 [di g(zi )′ ]]−1 1 n n i=1 [g(zi )yi ] VC and CH Econometrics of High-Dimensional Sparse Models
  • 49. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi IV Selection: Theoretical Justification Theorem (Result 3: 2SLS with LASSO-selected IV) Under practical regularity conditions, if the optimal instrument is sufficient sparse, namely s2 log2 p = o(n), and is strong, namely |E[di g(zi )]| is bounded away from zero, we have that σ−1 n √ n(α − α) →d N(0, 1), where σ2 n is the standard White’s robust formula for the variance of 2SLS. The estimator is semi-parametrically efficient under homoscedasticity. ◮ Ref: Belloni, Chen, Chernozhukov, and Hansen (Econometrica, 2012) for a general statement. ◮ A weak-instrument robust procedure is also available – the sup-score test; see Ref. ◮ Key point: “Selection mistakes” are asymptotically negligible due to ”low-bias” property of the estimating equations, which we shall discuss later. VC and CH Econometrics of High-Dimensional Sparse Models
  • 50. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi IV Selection: Monte Carlo Justification A representative example: Everything Gaussian, with di = 100 j=1 xij · µj + vi , |µ| < 1 This is an approximately sparse model where most of information is contained in a few instruments. Case 1. p = 100 < n = 250, first stage E[F] = 40 Estimator RMSE 5% Rej Prob (Fuller’s) 2SLS ( All IVs) 0.13 5.6% 2SLS (LASSO IVs) 0.08 6% Remark. Fuller’s 2SLS is a consistent under many instruments, and is a state of the art method. VC and CH Econometrics of High-Dimensional Sparse Models
  • 51. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi IV Selection: Monte Carlo Justification A representative example: Everything Gaussian, with di = 100 j=1 xij · µj + vi , |µ| < 1 This is an approximately sparse model where most of information is contained in a few instruments. Case 2. p = 100 = n = 100, first stage E[F] = 40 Estimator RMSE 5% Rej Prob (Fuller’s) 2SLS (Alls IVs) 5.05 8% 2SLS (LASSO IVs) 0.13 6% ◮ Conclusion. Performance of the new method is quite reassuring. VC and CH Econometrics of High-Dimensional Sparse Models
  • 52. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 53. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 4. Estimation & Inference on Treatment Effects in a Partially Linear Model Example 3: (Exogenous) Cross-Country Growth Regression. ◮ Relation between growth rate and initial per capita GDP, conditional on covariates, describing institutions and technological factors: GrowthRate yi = β0 + α ATE log(GDP) di + p j=1 βj xij + ǫi where the model is exogenous, E[ǫi|di , xi] = 0. ◮ Test the convergence hypothesis – α < 0 – poor countries catch up with richer countries, conditional on similar institutions etc. Prediction from the classical Solow growth model. ◮ In Barro-Lee data, we have p = 60 covariates, n = 90 observations. Need to do selection. VC and CH Econometrics of High-Dimensional Sparse Models
  • 54. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi How to perform selection? ◮ (Don’t do it!) Naive/Textbook selection 1. Drop all x′ ij s that have small coefficients, using model selection devices (classical such as t-tests or modern) 2. Run OLS of yi on di and selected regressors. Does not work because fails to control omitted variable bias. (Leeb and P¨otscher, 2009). ◮ We propose Double Selection approach: 1. Select controls xij ’s that predict yi . 2. Select controls xij ’s that predict di . 3. Run OLS of yi on di and the union of controls selected in steps 1 and 2. ◮ The additional selection step controls the omitted variable bias. ◮ We find that the coefficient on lagged GDP is negative, and the confidence intervals exclude zero. VC and CH Econometrics of High-Dimensional Sparse Models
  • 55. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Real GDP per capita (log) Method Effect Std. Err. Barro-Lee (Economic Reasoning) −0.02 0.005 All Controls (n = 90, p = 60) −0.02 0.031 Post-Naive Selection −0.01 0.004 Post-Double-Selection −0.03 0.011 ◮ Double-Selection finds 8 controls, including trade-openness and several education variables. ◮ Our findings support the conclusions reached in Barro and Lee and Barro and Sala-i-Martin. ◮ Using all controls is very imprecise. ◮ Using naive selection gives a biased estimate for the speed of convergence. VC and CH Econometrics of High-Dimensional Sparse Models
  • 56. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi TE in a Partially Linear Model Partially linear regression model (exogenous) yi = di α0 + g(zi ) + ζi , E[ζi | zi , di ] = 0, ◮ yi is the outcome variable ◮ di is the policy/treatment variable whose impact is α0 ◮ zi represents confounding factors on which we need to condition For us the auxilliary equation will be important: di = m(zi ) + vi, E[vi | zi] = 0, ◮ m summarizes the counfounding effect and creates omitted variable biases. VC and CH Econometrics of High-Dimensional Sparse Models
  • 57. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi TE in a Partially Linear Model Use many control terms xi = P(zi ) ∈ IRp to approximate g and m yi = di α0 + x′ i βg0 + rgi g(zi ) +ζi , di = x′ i βm0 + rmi m(zi ) +vi ◮ Many controls. xi = zi. ◮ Many technical controls. xi = P(zi ), e.g. polynomials, trigonometric terms. Key assumption: g and m are approximately sparse VC and CH Econometrics of High-Dimensional Sparse Models
  • 58. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi The Inference Problem and Caveats yi = di α0 + x′ i βg0 + ri + ζi , E[ζi | zi , di ] = 0, Naive/Textbook Inference: 1. Select controls terms by running Lasso (or variants) of yi on di and xi 2. Estimate α0 by least squares of yi on di and selected controls, apply standard inference However, this naive approach has caveats: ◮ Relies on perfect model selection and exact sparsity. Extremely unrealistic. ◮ Easily and badly breaks down both theoretically (Leeb and P¨otscher, 2009) and practically. VC and CH Econometrics of High-Dimensional Sparse Models
  • 59. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Monte Carlo ◮ In this simulation we used: p = 200, n = 100, α0 = .5 yi = di α0 + x′ i (cy θ0) + ζi , ζi ∼ N(0, 1) di = x′ i (cdθ0) + vi, vi ∼ N(0, 1) ◮ approximately sparse model: θ0j = 1/j2 ◮ let cy and cd vary to vary R2 in each equation ◮ regressors are correlated Gaussians: x ∼ N(0, Σ), Σkj = (0.5)|j−k| . VC and CH Econometrics of High-Dimensional Sparse Models
  • 60. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Distribution of Naive Post Selection Estimator R2 d = .5 and R2 y = .5 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 0 =⇒ badly biased, misleading confidence intervals; predicted by theorems in Leeb and P¨otscher (2009) VC and CH Econometrics of High-Dimensional Sparse Models
  • 61. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Inference Quality After Model Selection Look at the rejection probabilities of a true hypothesis. yi = di α0 + x′ i ( =⇒ R2 y cy θ0) + ζi di = x′ i ( cd =⇒ R2 d θ0) + vi Ideal Rejection Rate 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 Second Stage R2First Stage R2 VC and CH Econometrics of High-Dimensional Sparse Models
  • 62. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Inference Quality of Naive Selection Look at the rejection probabilities of a true hypothesis. Naive/Textbook Selection 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 Second Stage R2First Stage R2 Ideal 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 Second Stage R2First Stage R2 actual rejection probability (LEFT) is far off the nominal rejection probability (RIGHT) consistent with results of Leeb and P¨otscher (2009) VC and CH Econometrics of High-Dimensional Sparse Models
  • 63. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Our Proposal: Post Double Selection Method To define the method, write the reduced form (substitute out di ) yi = x′ i ¯β0 + ¯ri + ¯ζi , di = x′ i βm0 + rmi + vi , 1. (Direct) Let ˆI1 be controls selected by Lasso of yi on xi. 2. (Indirect) Let ˆI2 be controls selected by Lasso of di on xi . 3. (Final) Run least squares of yi on di and union of selected controls: (ˇα, ˇβ) = argmin α∈R,β∈Rp { 1 n n i=1 [(yi − di α − x′ i β)2 ] : βj = 0, ∀j ∈ ˆI = ˆI1 ∪ˆI2}. The post-double-selection estimator. ◮ Belloni, Chernozhukov, Hansen (World Congress, 2010). ◮ Belloni, Chernozhukov, Hansen (ReStud, 2011, R &R) VC and CH Econometrics of High-Dimensional Sparse Models
  • 64. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Distributions of Post Double Selection Estimator R2 d = .5 and R2 y = .5 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 0 =⇒ low bias, accurate confidence intervals Belloni, Chernozhukov, Hansen (2011) VC and CH Econometrics of High-Dimensional Sparse Models
  • 65. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Inference Quality After Model Selection Double Selection 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 Second Stage R2First Stage R2 Ideal 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 Second Stage R2First Stage R2 the left plot is rejection frequency of the t-test based on the post-double-selection estimator Belloni, Chernozhukov, Hansen (2011) VC and CH Econometrics of High-Dimensional Sparse Models
  • 66. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Intuition ◮ The double selection method is robust to moderate selection mistakes. ◮ The Indirect Lasso step — the selection among the controls xi that predict di – creates this robustness. It finds controls whose omission would lead to a ”large” omitted variable bias, and includes them in the regression. ◮ In essence the procedure is a selection version of Frisch-Waugh procedure for estimating linear regression. VC and CH Econometrics of High-Dimensional Sparse Models
  • 67. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi More Intuition Think about omitted variables bias in case with one treatment (d) and one regressor (x): yi = αdi + βxi + ζi ; di = γxi + vi If we drop xi , the short regression of yi on di gives √ n(α − α) = good term + √ n (D′ D/n)−1 (X′ X/n)(γβ) OMVB . ◮ the good term is asymptotically normal, and we want √ nγβ → 0. VC and CH Econometrics of High-Dimensional Sparse Models
  • 68. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi More Intuition Think about omitted variables bias in case with one treatment (d) and one regressor (x): yi = αdi + βxi + ζi ; di = γxi + vi If we drop xi , the short regression of yi on di gives √ n(α − α) = good term + √ n (D′ D/n)−1 (X′ X/n)(γβ) OMVB . ◮ the good term is asymptotically normal, and we want √ nγβ → 0. ◮ naive selection drops xi if β = O( log n/n), but √ nγ log n/n → ∞ ◮ double selection drops xi only if both β = O( log n/n) and γ = O( log n/n), that is, if √ nγβ = O((log n)/ √ n) → 0. VC and CH Econometrics of High-Dimensional Sparse Models
  • 69. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Main Result Theorem (Result 4: Inference on a Coefficient in Regression) Uniformly within a rich class of models, in which g and m admit a sparse approximation with s2 log2 (p ∨ n)/n → 0 and other practical conditions holding, σ−1 n √ n(ˇα − α0) →d N(0, 1), where σ2 n is Robinoson’s formula for variance of LS in a partially linear model. Under homoscedasticity, semi-parametrically efficient. ◮ Model selection mistakes are asymptotically negligible due to double selection. VC and CH Econometrics of High-Dimensional Sparse Models
  • 70. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Bonus Track: Generalizations.∗ ◮ The double selection (DS) procedure implicitly identifies α0 implicitly off the moment condition: E[Mi (α0, g0, m0)] = 0, where Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi )) where g0 and m0 are (implicitly) estimated by the post-selection estimators. VC and CH Econometrics of High-Dimensional Sparse Models
  • 71. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Bonus Track: Generalizations.∗ ◮ The double selection (DS) procedure implicitly identifies α0 implicitly off the moment condition: E[Mi (α0, g0, m0)] = 0, where Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi )) where g0 and m0 are (implicitly) estimated by the post-selection estimators. ◮ The DS procedure works because Mi is ”immunized” against perturbations in g0 and m0: ∂ ∂g E[Mi (α0, g, m0)]|g=g0 = 0, ∂ ∂m E[Mi (α0, g0, m)]|m=m0 = 0. ◮ Moderate selection errors translate into moderate estimation errors, which have asymptotically negligible effects on large sample distribution of estimators of α0 based on the sample analogs of equations above. VC and CH Econometrics of High-Dimensional Sparse Models
  • 72. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Can this be generalized? Yes. Generally want to create moment equations such that target parameter α0 is identified via moment condition: E[Mi (α0, h0)] = 0, where α0 is the main parameter, and h0 is a nuisance function (e.g. h0 = (g0, m0)), with Mi ”immunized” against perturbations in h0: ∂ ∂h E[Mi (α0, h)]|h=h0 = 0 ◮ This property allows for ”non-regular” estimation of h, via post-selection or other regularization methods, with rates that are slower than 1/ √ n. ◮ It allows for moderate selection mistakes in estimation. ◮ In absence of the immunization property, the post-selection inference breaks down. VC and CH Econometrics of High-Dimensional Sparse Models
  • 73. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Bonus Track: Generalizations.∗ Examples in this Framework: 1. IV model Mi (α, g) = (yi − di α)g(zi ) has immunization property (since E[(yi − di α0)˜g(zi )] = 0 for any ˜g), and this =⇒ validity of inference after selection-based estimation of g) 2. Partially linear model Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi )) has immunization property, which =⇒ validity of post-selection inference, where we do double selection – controls that explain g and m. 3. Logistic, Quantile regression ◮ Belloni, Chernozhukov, Kato (2013, ArXiv) ◮ Belloni, Chernozhukov, Ying (2013, ArXiv) VC and CH Econometrics of High-Dimensional Sparse Models
  • 74. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 75. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 5. Heterogeneous Treatment Effects∗ ◮ Here di is binary, indicating the receipt of the treatment, ◮ Drop partially linear structure; instead assume di is fully interacted with all other control variables: yi = di g(1, zi ) + (1 − di )g(0, zi ) g(di ,zi ) +ζi , E[ζi | di , zi] = 0 di = m(zi) + ui , E[ui |zi ] = 0 (as before) ◮ Target parameter. Average Treatment Effect: α0 = E[g(1, zi ) − g(0, zi )]. ◮ Example. di = 401(k) eligibility, zi = characteristics of the worker/firm, yi= net savings or total wealth, α0 = the average impact of 401(k) eligibility on savings. VC and CH Econometrics of High-Dimensional Sparse Models
  • 76. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi 5. Heterogeneous Treatment Effects ∗ An appropriate Mi is given by Hahn’s (1998) efficient score Mi (α, g, m) = di (yi − g(1, zi )) m(zi ) − (1 − di )(yi − g(0, zi )) 1 − m(zi ) + g(1, zi ) − g(0, zi ) − α. which is ”immunized” against perturbations in g0 and m0: ∂ ∂g E[Mi (α0, g, m0)]|g=g0 = 0, ∂ ∂m E[Mi (α0, g0, m)]|m=m0 = 0. Hence the post-double selection estimator for α is given by ˇα = 1 N N i=1 di (yi − g(1, zi )) ˆm(zi ) − (1 − di )(yi − g(0, zi )) 1 − m(zi ) + g(1, zi ) − g(0, zi ) , where we estimate g and m via post- selection (Post-Lasso) methods. VC and CH Econometrics of High-Dimensional Sparse Models
  • 77. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Theorem (Result 5: Inference on ATE) Uniformly within a rich class of models, in which g and m admit a sparse approximation with s2 log2 (p ∨ n)/n → 0 and other practical conditions holding, σ−1 n √ n(ˇα − α0) →d N(0, 1), where σ2 n = E[M2 i (α0, g0, m0)]. Moreover, ˇα is semi-parametrically efficient for α0. ◮ Model selection mistakes are asymptotically negligible due to the use of ”immunizing” moment equations. ◮ Ref. Belloni, Chernozhukov, Hansen “Inference on TE after selection amongst high-dimensional controls” (2013 version, available via CEMMAP). VC and CH Econometrics of High-Dimensional Sparse Models
  • 78. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference with Many Instruments 4. Estimation & Inference on Treatment Effects in a Partially Linear Model 5. Estimation and Inference on TE in a General Model Conclusion VC and CH Econometrics of High-Dimensional Sparse Models
  • 79. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi Conclusion ◮ Approximately sparse model ◮ Corresponds to the usual ”parsimonious” approach, but specification searches are put on rigorous footing ◮ Useful for predicting regression functions ◮ Useful for selection of instruments ◮ Useful for selection of controls, but avoid naive/textbook selection ◮ Use double selection that protects against omitted variable bias ◮ Use “immunized” moment equations more generally VC and CH Econometrics of High-Dimensional Sparse Models
  • 80. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi References ◮ Bickel, P., Y. Ritov and A. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector”, Annals of Statistics, 2009. ◮ Candes E. and T. Tao, “The Dantzig selector: statistical estimation when p is much larger than n,” Annals of Statistics, 2007. ◮ Donald S. and W. Newey, “Series estimation of semilinear models,” Journal of Multivariate Analysis, 1994. ◮ Tibshirani, R, “Regression shrinkage and selection via the Lasso,” J. Roy. Statist. Soc. Ser. B, 1996. ◮ Frank, I. E., J. H. Friedman (1993): “A Statistical View of Some Chemometrics Regression Tools,”Technometrics, 35(2), 109–135. ◮ Gautier, E., A. Tsybakov (2011): “High-dimensional Instrumental Variables Rergession and Confidence Sets,” arXiv:1105.2454v2 ◮ Hahn, J. (1998): “On the role of the propensity score in efficient semiparametric estimation of average treatment effects,” Econometrica, pp. 315–331. ◮ Heckman, J., R. LaLonde, J. Smith (1999): “The economics and econometrics of active labor market programs,” Handbook of labor economics, 3, 1865–2097. ◮ Imbens, G. W. (2004): “Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review,” The Review of Economics and Statistics, 86(1), 4–29. ◮ Leeb, H., and B. M. P¨otscher (2008): “Can one estimate the unconditional distribution of post-model-selection estimators?,” Econometric Theory, 24(2), 338–376. ◮ Robinson, P. M. (1988): “Root-N-consistent semiparametric regression,” Econometrica, 56(4), 931–954. ◮ Rudelson, M., R. Vershynin (2008): “On sparse reconstruction from Foruier and Gaussian Measurements”, Comm Pure Appl Math, 61, 1024-1045. ◮ Jing, B.-Y., Q.-M. Shao, Q. Wang (2003): “Self-normalized Cramer-type large deviations for independent random variables,” Ann. Probab., 31(4), 2167–2215. VC and CH Econometrics of High-Dimensional Sparse Models
  • 81. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi All references below are downloadable via www.arXiv.org: ◮ Belloni, A., V. Chernozhukov, and C. Hansen (2010) “Inference for High-Dimensional Sparse Econometric Models,” Advances in Economics and Econometrics. 10th World Congress of Econometric Society, Shanghai, 2010. (ArXiv, 2011). ◮ Belloni, Chernozhukov, Hansen (2011) “Inference on TE after selection amongst high-dimensional controls”, CEMMAP working paper (revised June, 2013). R&R Review of Economic Studies. (ArXiv, 2011) ◮ Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain,” Econometrica, 80, 2369–2429. (ArXiv, 2010) ◮ Belloni, A., and V. Chernozhukov (2011a): “ℓ1-penalized quantile regression in high-dimensional sparse models,” Ann. Statist., 39(1), 82–130. (ArXiv, 2009) ◮ Belloni, A., and V. Chernozhukov (2013): “Least Squares After Model Selection in High-dimensional Sparse Models,” Bernoulli, 19(2), 521–547. (ArXiv, 2009) ◮ Belloni, A., V. Chernozhukov, L. Wang (2011): “Square-Root-LASSO: Pivotal Recovery of Sparse Signals via Conic Programming,” Biometrika, 98(4), 791–806. (ArXiv, 2010). ◮ Belloni, A., V. Chernozhukov, K. Kato (2013): “Uniform Post Selection Inference for LAD Regression Models,” arXiv preprint arXiv:1304.0282. (ArXiv, 2013) ◮ Belloni, A., V. Chernozhukov, L. Wang (2011): “Square-Root-LASSO: Pivotal Recovery of Nonparametric Regression Functions via Conic Programming,” (ArXiv, 2011) ◮ Belloni, A., V. Chernozhukov, Y. Wei (2013): “Honest Confidence Regions for Logistic Regression with a Large Number of Controls,” arXiv preprint arXiv:1304.3969 (ArXiv, 2013). VC and CH Econometrics of High-Dimensional Sparse Models