6. Koenker and Bassett
Define the loss function for τ ∈ (0, 1):
ρτ (x) =
−(1 − τ)x if x < 0
τx if x ≥ 0
Then we can find that for a sample (x1, x2, . . . , xn) from X the
solution to:
arg min
θ
n
i=1
ρτ (xi − θ)
Is θ = the τth quantile of the sample.
7. Koenker and Bassett
We can use this loss function to perform quantile regression. For
simple linear regression in two dimensions with a sample
((x1, y1), (x2, y2), . . . , (xn, yn)) from (X, Y ), we can solve the
minimization problem
arg min
β0,β1
n
i=1
ρτ (yi − β0 − β1xi )
to obtain our lines for conditional quantile estimation. That is, we
can estimate the τth quantile of Y for a given value of X by
ˆY = β0 + β1X.
8. MLE
We note that if we take a, b > 0, we can reparameterize the loss
function by:
ρa,b(x) =
−ax if x < 0
bx if x ≥ 0
This gives an equivalent minimization to ρτ where τ = b
a+b . We
can also create a double exponential distribution by:
fa,b(x) = c ∗ e−ρa,b(x)
Where c = ab
a+b so that this function integrates to 1.
9. Plot of Koenker’s Constant Values for τ = .75
−3 −2 −1 0 1 2 3
−0.50.00.51.01.5
x
y
−a = −.5
b = 1.5
10. Plot of Koenker’s ρ Function for τ = .75
−3 −2 −1 0 1 2 3
01234
x
y
−a = −.5 b = 1.5
11. Plot of Corresponding Double Exponential Distribution
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
x
y
−a = −.5 b = 1.5
12. MLE
Note that minimizing Koenker’s loss function is equivalent to
fitting a double exponential distribution to data using MLE. That
is:
arg min
θ
n
i=1
ρa,b(xi − θ) = arg max
θ
n
i=1
fa,b(xi − θ)
13. MLE
Classical quantile regression, like L1 error, does not have an
analytic solution for MLE, as the derivative of ρa,b is
discontinuous
A version of the Simplex method is used instead to solve this
minimization efficiently
In an attempt to find an analytic solution, we create a
continuous version of the derivative
Some possible solutions are ”S-curves”, such as the cdf of a
normal distribution or the cdf of a logistic distribution
14. S-Curve with Scaling Factor c = 1
−3 −2 −1 0 1 2 3
−0.50.00.51.01.5
x
y
−a = −.5
b = 1.5
For this example, we use a modified version of the logistic
distribution, where it is shifted and scaled such that it passes
through the origin and the horizontal asymptotes occur at −a and
b.
15. S-Curves with Various Scaling Factors
−3 −2 −1 0 1 2 3
−0.50.00.51.01.5
x
y
A tuning parameter, denoted by c, adjusts the slope through the
origin. The higher the value of c, the steeper the slope.
16. Smooth ρ Function with Scaling Factor c = 1
−3 −2 −1 0 1 2 3
01234
x
y
−a = −.5 b = 1.5
17. Smooth ρ Functions with Various Scaling Factors
−3 −2 −1 0 1 2 3
01234
x
y
21. Theoretic MLE Quantiles For c = 1 and N(0, 1) data
log(a,10)
log(b,10)
0.001
0.01
0.025
0.05
0.1
0.25
0.333
0.4
0.444
0.5
0.556
0.6
0.667
0.75
0.9
0.95
0.975
0.99
0.999
−2 −1 0 1 2
−2−1012
22. MLE
Using smooth function allows us to perform quantile
estimation while allowing for analytic solutions
Parametric - an explicit parametric assumption for the
distribution must be made
No added robustness
23. L2E
L2 estimation, or L2E, was developed by Scott (2001) as a robust,
parametric density estimator. To estimate a density g(x) from a
sample (x1, x2, . . . , xn) by a family of distributions f (x; θ), we find
the value of θ solving the equation:
arg min
θ
f (x; θ)2
dx −
2
n
n
i=1
f (xi ; θ)
24. L2E Extension
We can apply this method to quantile regression by trying to fit a
double exponential distribution to a sampling density g(x). For
given values of a and b, we can find the theoretic value of θ for the
given function g(x) by taking
arg min
θ
f (x; θ)2
dx − 2 f (x; θ)g(x)dx
Which, because f (x; θ) = fa,b(x − θ), f (x; θ)2dx does not
depend on θ, reduces to:
arg min
θ
−2 f (x; θ)g(x)dx
25. L2E Extension
Theoretic quantile achieved for most distributions is not b
a+b
(Although this is true for Unif (0, 1))
Assumption about the distribution of the residuals must be
made
We examine assumption that the residuals are N(0, 1)
32. Regression Results
We can use the L2E loss function to perform quantile
regression
Robust
The following plots have 900 points of uncontaminated
multivariate normal data where the residuals around the least
squares regression line are distributed N(0, 1).
100 points of contamination are placed above the
uncontaminated points
L2E quantile regression and classical quantile regression are
then compared
44. Dealing with Unknown Sigma
Though we might be able to assume normal residuals about
the mean regression line, Assuming N(0, 1) is a stretch
To obtain a robust estimate of the standard deviation of the
residuals, we can use regular L2E regression
Scale the data so that the residuals are N(0, 1)
Perform method from before
Rescale slope and intercept parameters by the standard
deviation estimate to obtain parameters for original data
47. Summary and Future Research
If parametric assumptions are valid, L2E quantile regression
provides a robust method to estimate the conditional
quantiles of data, ignoring the contamination in the data
Extends to higher dimensions
Smooth versions of the double exponential distribution may
lead us to analytic results
From here, we will examine non-linear regression,
semi-parametric methods, and M-estimation methods