Econometrics2011
.pdfCHAPTER 7. RESTRICTED ESTIMATION |
138 |
7.3Exclusion Restriction
While (4.4) is a general formula for the CLS estimator, in most cases the estimator can be found by applying least-squares to a reparameterized equation. To illustrate, let us return to the …rst example presented at the beginning of the chapter –a simple exclusion restriction. Recall the unconstrained model is
yi = x10 |
i 1 + x20 |
i 2 + ei |
(7.9) |
the exclusion restriction is 2 = 0; and the constrained equation is
yi = x10 |
i 1 + ei: |
(7.10) |
In this setting the CLS estimator is OLS of yi on x1i: (See Exercise 7.1.) We can write this as
1 = |
n |
x1ix10 i! 1 |
n |
x1iyi!: |
e |
Xi |
|
X |
|
|
=1 |
|
i=1 |
|
The CLS estimator of the entire vector 0 = 01; 02 is
e |
e |
|
|
= |
|
: |
(7.11) |
01 |
It is not immediately obvious, but (7.8) and (7.11) are algebraically (and numerically) equivalent. To see this, the …rst component of (7.8) with (7.2) is
|
|
e |
|
|
|
|
|
|
|
|
b b |
|
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
b |
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
0 |
|
|
|
|
|
1 |
|
|
0 |
|
1 |
|
0 I #: |
|
||||||||
|
|
1 = I 0 |
|
" Qxx |
|
|
|
I |
|
|
0 |
I |
Qxx |
|
|
I |
|
|
|
|
|
|||||||||||||||||||||
Using (4.28) this equals |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 = 1 Q121 Q22 1 |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
e |
= |
|
+ Q |
|
2 |
Q Q |
Q |
|
|
|
1 |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
b1 |
|
|
|
b11 |
|
|
b12 |
22 |
b |
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
= Q 1 |
|
|
Q Q Q 1Q + Q 1 Q Q 1Q Q 1 |
|
Q Q Q 1Q |
|
|||||||||||||||||||||||||||||||||||
|
|
b |
11 2 |
|
b |
1y |
|
b |
12 b22 |
b |
21 |
|
11 b |
1y |
b b b |
|
|
|
b |
|
|
|
b |
|
|
b b b |
|
|||||||||||||||
|
|
b |
|
|
1 |
|
|
1y |
|
12 |
|
|
|
1 |
|
|
|
2y |
1 |
|
|
12 |
|
|
22 |
1 |
|
|
|
2y |
21 |
|
||||||||||
|
|
11 2 b |
|
|
|
b b |
|
b |
|
|
|
|
|
b |
|
|
11 2 |
|
22 |
|
|
|
|
|
22 1 |
|
|
|
|
|
11 |
|
||||||||||
|
= Q |
|
|
|
Q |
|
|
|
|
Q Q Q Q Q |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
= Q 1 |
|
Q |
|
|
|
Q Q 1Q |
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
= |
b |
|
|
1 |
|
|
|
|
|
|
b b b |
|
b b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
11 |
|
|
1yb |
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
b |
11 2 |
|
b |
|
b |
12 b22 |
b |
21 bQ11bQ1y |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||
|
|
Q Q |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
b |
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
which is (7.11) as originally claimed.
7.4Minimum Distance
The CLS estimator is a special case of a more general class of constrained estimators. To
b
see this, rewrite the least-squares criterion as follows. Let be the unconstrained least-squares
0 b
estimator, and write the uncontrained least-squares …tted equation as yi = xi + e^i: Substitute
CHAPTER 7. RESTRICTED ESTIMATION |
|
|
|
|
|
|
139 |
||
this equation into SSEn( ) to obtain |
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
SSEn( ) = |
Xi |
|
|
2 |
|
|
|
|
|
n |
|
|
|
|
|
||||
|
=1 |
|
|
|
|
|
|
|
|
= |
X |
xi0 + e^i xi0 2 |
|
|
|
|
|||
i=1 |
|
|
|
|
|||||
|
n |
b |
0 |
n |
|
! |
b |
|
|
|
X |
b |
X |
|
|||||
= |
i=1 e^i2 + |
i=1 xixi0 |
|
(7.12) |
|||||
= |
n^2 + n 0 |
Qxx |
: |
||||||
|
|
b |
|
|
b |
b |
|
|
|
where the third equality uses the fact that Pni=1 xie^i = 0: Since the …rst term on the last line does not depend on it follows that the CLS estimator minimizes the quadratic on the right-side of
b
(7.12): This is a (squared) weighted Euclidean distance between and : It is a special case of the general weighted distance
Jn ( ; W n) = n |
0 |
W n 1 |
|
|
b |
|
b |
for W n > 0 a k k positive de…nite weight matrix. In summary, we have found that the CLS estimator can be written as
= argmin J |
|
( ; Q 1) |
|
e |
R0 =c |
n |
bxx |
More generally, a minimum distance estimator for is
|
(W |
|
) = argmin J |
|
( ; W |
|
) |
(7.13) |
emd |
|
n |
R0 =c |
n |
|
n |
|
|
e
where W n > 0. We have written the estimator as md(W n) as it depends upon the weight matrix
W n:
An obvious question is which weight matrix W n is appropriate. We will address this question after we derive the asymptotic distribution for a general weight matrix.
7.5Computation
A general method to solve the algebraic problem (7.13) is by the method of Lagrange multipliers.
The Lagrangian is
L( ; ) = 12Jn ( ; W n) + 0 R0 c which is minimized over ( ; ): The solution is
e |
b |
|
|
1 |
R0 |
b |
(7.14) |
md(W n) = W nR R0W nR |
|
c : |
(See Exercise 7.5.)
b 1
If we set W n = Qxx then (7.14) specializes to the CLS estimator:
|
(Q 1) = |
|
emd |
bxx |
ecls |
In this sense the minimum distance estimator generalizes constrained least-squares.
CHAPTER 7. RESTRICTED ESTIMATION |
140 |
7.6Asymptotic Distribution
We …rst show that the class of minimum distance estimators are consistent for the population parameters when the constraints are valid.
Assumption 7.6.1 R0 = c where R is k q with rank(R) = q:
Theorem 7.6.1 Consistency
Under Assumption 1.5.1, Assumption 3.16.1, Assumption 7.6.1,
p |
|
(W ) |
p |
as n |
: |
and W n ! W > 0; emd |
n |
! |
|
! 1 |
Theorem 7.6.1 shows that consistency holds for any weight matrix, so the result includes the CLS estimator.
Similarly, the constrained estimators are asymptotically normally distributed.
Theorem 7.6.2 Asymptotic Normality
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
p |
Under Assumption 1.5.1, Assumption 6.4.1, Assumption 7.6.1, and W n ! W > 0; |
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
d |
|
|
|
|
|
|
|
|
|
|
|
|
|
pn md(W n) |
|
|
|
|
|
|
|
|
|
(7.15) |
|||||||||
|
|
|
|
|
! N (0; V (W )) |
|
||||||||||||||||||
as n |
! 1 |
; where |
|
|
|
|
e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
|
|
||
|
|
|
|
|
|
|
|
|
|
R0 |
|
0 |
|
|
|
0 |
|
|
|
1 R0W |
||||
|
V |
|
(W ) = |
V |
|
W R R0W R |
1 |
R0V |
|
V |
|
R R0W R |
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
1 R W |
|
(7.16) |
|
|
|
|
|
|
+W R |
|
W R |
R V R R W R |
|
and
V = Qxx1 Qxx1:
Theorem 7.6.2 shows that the minimum distance estimator is asymptotically normal for all positive de…nite weight matrices. The asymptotic variance depends on W . The theorem includes the CLS estimator as a special case by setting W = Qxx1:
Theorem 7.6.3 Asymptotic Distribution of CLS Estimator
Under Assumption 1.5.1, Assumption 6.4.1, and Assumption 7.6.1, as n ! 1
|
|
|
|
|
|
|
|
|
|
|
|
d |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
pn cls ! N (0; V cls) |
|
|
|
|
|||||||||||||||
where |
|
|
|
|
|
|
|
e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V |
cls |
= V |
|
Q 1R R0Q 1R 1 |
R0V |
|
V |
|
R R0Q 1R |
|
1 R0Q 1 |
||||||||||||
|
|
xx |
xx |
xx |
|
|
|
|
|
|
|
xx |
xx |
||||||||||
|
|
|
xx |
|
|
1 R |
|
|
|
|
xx |
|
1 |
xx |
|
||||||||
|
|
|
1 |
Q |
1R |
|
V |
|
R R Q |
1R |
|
R Q 1 |
|
||||||||||
|
|
+Q |
R R0 |
|
|
|
0 |
|
|
|
0 |
|
|
|
0 |
|
CHAPTER 7. RESTRICTED ESTIMATION |
141 |
7.7E¢ cient Minimum Distance Estimator
Theorem 7.6.2 shows that the minimum distance estimators, which include CLS as a special case, are asymptotically normal with an asymptotic covariance matrix which depends on the weight matrix W . The asymptotically optimal weight matrix is the one which minimizes the asymptotic variance V (W ): This turns out to be W = V has shown in Theorem 7.7.1 below. Since V is unknown this weight matrix cannot be used for a feasible estimator, but we can replace V with
a consistent estimate V and the asymptotic distribution (and e¢ ciency) are unchanged. We call |
|||||
the minimum distancebestimator setting W n = V the e¢ cient minimum distance estimator |
|||||
and takes the form |
|
b |
1 |
R0 c |
(7.17) |
= V |
R R0V R |
|
|||
e b |
b |
b |
|
b |
|
This estimator has the smallest asymptotic variance in the class of minimum distance estimators, The asymptotic distribution of (7.17) can be deduced from Theorem 7.6.2.
Theorem 7.7.1 E¢ cient Minimum Distance Estimator
Under Assumption 1.5.1, Assumption 6.4.1, and Assumption 7.6.1, for de…ned |
|||||||
in (7.17) , |
p |
|
|
d |
|
|
e |
|
n |
! N 0; V |
|
||||
as n ! 1; where |
|
|
e |
|
|
|
|
V |
= V V R R0V R 1 R0V : |
(7.18) |
|||||
Since |
|
|
|
|
|
|
|
|
|
|
V |
V |
|
|
(7.19) |
the estimator (7.17) has lower asymptotic variance than the unrestricted estimator. Furthermor, for any W ;
V V (W ) |
(7.20) |
so (7.17) is asymptotically e¢ cient in the class of minimum distance estimators.
Theorem 7.7.1 shows that the minimum distance estimator with the smallest asymptotic variance is (7.17). One implication is that the constrained least squares estimator is generally inef- …cient. The interesting exception is the case of conditional homoskedasticity, in which case the optimal weight matrix is W = V 1 = 2Qxx so in this case CLS is an e¢ cient minimum distance estimator. Otherwise when the error is conditionally heteroskedastic, there are asymptotic e¢ ciency gains by using minimum distance rather than least squares.
The fact that CLS is generally ine¢ cient is counter-intuitive and requires some re‡ection to understand. Standard intuition suggests to apply the same estimation method (least squares) to the unconstrained and constrained models, and this is the most common empirical practice. But our statistical analysis has shown that this is not the e¢ cient estimation method. Instead, the e¢ cient minimum distance estimator has a smaller asymptotic variance. Why? The reason is that the least-squares estimator does not make use of the regressor x2i: It ignores the information E(x2iei) = 0. This information is relevant when the error is heteroskedastic and the excluded regressors are correlated with the included regressors.
e
Inequality (7.19) shows that the e¢ cient minimum distance estimator has a smaller asymptotic
b
variance than the unrestricted least squares estimator : This means that estimation is more e¢ cient by imposing correct restrictions when we use the minimum distance method.
CHAPTER 7. RESTRICTED ESTIMATION |
142 |
7.8Exclusion Restriction Revisited
We return to the example of estimation with a simple exclusion restriction. The model is
yi = x01i 1 + x02i 2 + ei
with the exclusion restriction 2 = 0: We have introduced three estimators of 1: The …rst is unconstrained least-squares applied to (7.9), which can be written as
|
|
|
|
|
1 |
= Q 1 |
Q |
1y 2 |
: |
|
|
|
|
|
|
|
|
11 2 |
|
|
|
||
|
|
|
|
its asymptotic variance is |
|
||||||
From Theorem 6.28 and equation (6.19) b |
|
|
b |
b |
|
|
Q1112: |
||||
|
avar( 1) = Q1112 |
11 Q12Q221 21 12Q221Q21 + Q12Q221 22Q221Q21 |
|||||||||
The |
second estimator of |
1 |
is the CLS estimator, which can be written as |
|
|||||||
b |
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
= Q 1Q |
: |
|
||
|
|
|
|
e1;cls |
b11 b1y |
|
|
Its asymptotic variance can be deduced from Theorem 7.6.3, but it is simpler to apply the CLT directly to show that
avar( |
1;cls |
) = Q 1 |
|
11 |
Q 1 |
: |
(7.21) |
e |
11 |
|
11 |
|
|
The third estimator of 1 is the e¢ cient minimum distance estimator. Applying (7.17), it equals
|
= |
V |
V |
1 |
|
2 |
(7.22) |
|
e1;md |
b |
1 b |
12 b |
22 |
b |
|
||
where we have partitioned |
= " |
b |
b |
|
#: |
|
|
|
|
|
|
|
|||||
b |
V 11 |
V 12 |
|
|
||||
b |
b |
|
|
|
||||
V |
V 21 |
V 22 |
|
|
From Theorem 7.7.1 its asymptotic variance is
avar( |
1;md |
) = V |
11 |
V |
12 |
V 1V |
21 |
: |
(7.23) |
e |
|
|
22 |
|
|
In general, the three estimators are di¤erent, and they have di¤erent asymptotic variances.
It is quite instructive to compare the asymptotic variances of the CLS and unconstrained leastsquares estimators to assess whether or not the constrained estimator is necessarily more e¢ cient than the unconstrained estimator.
First, consider the case of conditional homoskedasticity. In this case the two covariance matrices simplify to
|
avar( |
) = 2Q 1 |
|
|
and |
b1 |
|
11 2 |
|
|
avar( |
|
) = 2Q 1 |
: |
|
e1;cls |
11 |
|
If Q12 = 0 (so x1i and x2i are orthogonal) then these two variance matrices equal and the two estimators have equal asymptotic e¢ ciency. Otherwise, since Q12Q221Q21 0; then Q11 Q11 Q12Q221Q21; and consequently
Q111 2 Q11 Q12Q221Q21 1 2:
e
This means that under conditional homoskedasticity, 1;cls has a lower asymptotic variance matrix
b
than 1: Therefore in this context, constrained least-squares is more e¢ cient than unconstrained least-squares. This is consistent with our intuition that imposing a correct restriction (excluding an irrelevant regressor) improves estimation e¢ ciency.