Econometrics2011
.pdfCHAPTER 7. RESTRICTED ESTIMATION |
143 |
However, in the general case of conditional heteroskedasticity this ranking is not guaranteed, in fact what is really amazing is that the variance ranking can be reversed. The CLS estimator can have a larger asymptotic variance than the unconstrained least squares estimator.
To see this let’s use the simple heteroskedastic example from Section 6.5. In that example,
1 7 3
Q11 = Q22 = 1; Q12 = 2; 11 = 22 = 1; and 12 = 8: We can calculate that Q11 2 = 4 and
|
|
|
|
1;b |
|
2 |
(7.24) |
||
|
|
|
avar( 1) |
= |
|
|
|
||
|
|
|
e |
|
|
3 |
|
||
|
|
|
|
|
5 |
|
|
(7.25) |
|
|
|
|
avar( cls) |
= |
1 |
||||
|
|
|
avar( 1;md) = |
|
: |
(7.26) |
|||
|
|
|
8 |
||||||
|
|
|
|
|
|
|
|
|
|
Thus the restricted least-squares estimator e1 has a larger variance than the unrestricted least- |
|||||||||
squares estimator |
|
! The minimum distance estimator has the smallest variance of the three, as |
|||||||
expected. |
1 |
|
e |
|
|
|
|
|
|
What we have foundb |
is that when the estimation method is least-squares, deleting the irrelevant |
variable x2i can actually decrease the precision of estimation of 1; or equivalently, adding the irrelevant variable x2i can actually improve the precision of the estimation.
To repeat this unexpected …nding, we have shown in a very simple example that it is possible for least-squares applied to the short regression (7.10) to be less e¢ cient for estimation of 1 than least-squares applied to the long regression (7.9), even though the constraint 2 = 0 is valid! This result is strongly counter-intuitive. It seems to contradict our initial motivation for pursuing constrained estimation –to improve estimation e¢ ciency.
It turns out that a more re…ned answer is appropriate. Constrained estimation is desirable, but not constrained least-squares estimation. While least-squares is asymptotically e¢ cient for estimation of the unconstrained projection model, it is not an e¢ cient estimator of the constrained projection model.
7.9Variance and Standard Error Estimation
The asymptotic covariance matrix (7.18) may be estimated by replacing V with a consistent
b
estimates such as V . This variance estimator is then
V |
|
= V |
V |
R |
R0V |
R 1 |
R0V |
: |
(7.27) |
b |
|
b |
b |
|
b |
|
b |
|
|
0e
We can calculate standard errors for any linear combination h so long as h does not lie in
0e
the range space of R. A standard error for h is
s(h0 ) = |
n 1h0V |
h |
1=2 |
: |
|||
e |
b |
|
|
7.10Nonlinear Constraints
In some cases it is desirable to impose nonlinear constraints on the parameter vector . They can be written as
r( ) = 0 |
(7.28) |
where r : Rk ! Rq: This includes the linear constraints (7.1) as a special case. An example of (7.28) which cannot be written as (7.1) is 1 2 = 1; or r( ) = 1 2 1:
The minimum distance estimator of subject to (7.28) solves the minimization problem
= argmin J |
|
( ) |
(7.29) |
|
e |
r( )=0 |
n |
|
|
CHAPTER 7. RESTRICTED ESTIMATION |
|
|
|
|
144 |
|||
where |
0 |
|
1 |
|
|
|||
Jn ( ) = n |
V |
: |
||||||
The solution minimizes the Lagrangian |
b |
|
b |
|
b |
|
||
|
1 |
|
( ) + 0r( ) |
(7.30) |
||||
L( ; ) = |
|
Jn |
||||||
2 |
over ( ; ):
e
Computationally, there is no explicit expression for the solution so it must be found numerically. Computational methods are based on the method of quadratic programming and are not reviewed here.
Assumption 7.10.1 r( ) = 0 with rank(R) = q; where R = @@ r( )0:
The asymptotic distribution is a simple generalization of the case of a linear constraint, but the proof is more delicate.
Theorem 7.10.1 Under Assumption 1.5.1, Assumption 6.4.1, and Assumption
e
7.10.1, for de…ned in (7.29) ,
|
|
|
|
|
|
|
|
|
|
d |
|
|
|
|
|
|
|
|
|
|
|
|
|
pn |
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
! N 0; V |
|
|
|
|
|
||||||||
|
as n |
! 1 |
; where |
|
|
|
e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V = V V R R0V R 1 R0V |
|
|
|
|
|||||||||||
The asymptotic variance matrix can be estimated by |
|
|
|
|
|
|
||||||||||||
|
|
|
|
V |
= V |
V |
R R0V |
R 1 |
R0V |
|
|
|
|
|
||||
where |
|
|
b |
b |
b |
b |
@ |
b b |
b |
b b |
|
|
|
|
|
|||
|
|
|
|
|
|
|
R = |
|
r( )0: |
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
e |
b |
@ |
e |
|
|
1 |
|
b |
|
: |
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
Standard errors for the elements of are the square roots of the diagonal elements of n |
V |
|
7.11Technical Proofs*
Proof of Theorem 7.7.1, Equation (7.20). Let R? be a full rank k (k q) matrix satisfying
R0?V R = 0 and then set C = [R; R?] which is full rank and invertible. Then we can calculate that
|
R0V |
R R0V |
R |
? |
|
C0V C = |
R?0 V |
|
|
|
|
R R?0 V |
R? |
||||
|
|
|
|
|
|
= |
0 |
0 |
|
|
|
0 R?0 V R? |
|
|
|
||
|
|
|
|
CHAPTER 7. |
RESTRICTED ESTIMATION |
|
|
|
|
145 |
||||||||||||
and |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C0V (W )C = |
R0V |
(W )R R0V |
(W )R |
? |
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|||||||||
R?0 V (W )R R?0 V (W )R? |
|
|||||||||||||||||
|
|
|
|
= |
|
0 |
|
|
|
|
|
|
|
|
|
0 |
: |
|
Thus |
|
|
|
0 R?0 V R? + R?0 W R (R0W R) 1 R0V R (R0W R) 1 R0W R? |
||||||||||||||
|
|
|
|
|
|
|
= C00 |
|
|
|
|
|
|
|
|
|
||
C0 |
|
(W ) |
V |
C |
|
(W )C |
0 |
V |
|
C |
0 |
|
||||||
|
V |
|
|
|
|
V |
|
|
C |
|
|
|
||||||
|
|
|
|
|
|
|
|
= |
0 R?0 W R (R0W R) 1 R0V R (R0W R) 1 R0W R? 0 |
Since C is invertible it follows that V (W ) V 0 which is (7.20).
Proof of Theorem 7.10.1. For simplicity, we assume that the constrained estimator is consistent
p |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
! . This can be shown with more e¤ort, but requires a deeper treatment than appropriate |
||||||||||||||||||||||
for this textbook. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
e For each element rj( ) of the q-vector r( ); by the mean value theorem there exists a j on |
||||||||||||||||||||||
|
|
|
|
and such that |
|
|
|
|
|
|
|
|
|
|
|
|||||||
the line segment joining e |
r |
( ) = r |
( ) + |
@ |
r |
( )0 |
|
|
|
|
: |
(7.31) |
||||||||||
|
|
|
||||||||||||||||||||
Let R |
be the k |
|
q matrix |
j |
e |
j |
|
|
|
@ j |
|
j e |
|
|
|
|
|
|
||||
n |
|
|
Rn = |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
rq( q) : |
|
|||
|
|
|
|
@ |
r1( 1) |
@ |
r2( 2) |
|
|
@ |
|
|||||||||||
|
|
|
|
@ |
@ |
|
@ |
|
||||||||||||||
Since |
p |
|
|
|
|
|
p |
|
|
|
|
|
|
|
|
|
|
p |
|
R: Stacking the (7.31), we obtain |
||
|
it follows that j ! , and by the CMT, Rn |
|
|
|
||||||||||||||||||
e |
! |
|
|
|
|
|
r( ) = r( ) + R0 |
|
!: |
|
|
|||||||||||
|
|
|
|
|
|
|
|
e |
|
|
|
|
n |
e |
|
|
|
|
|
|
e
Since r( ) = 0 by construction and r( ) = 0 by Assumption 7.6.1, this implies
|
|
|
|
|
|
|
0 = R0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
(7.32) |
||
|
|
|
|
|
|
|
|
|
|
: |
|
|
|
|
|
|
|
|
|
||||
The …rst-order condition for (7.30) is |
n |
e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
Premultiplying by R0V ; inverting, |
V 1 = R : |
|
|
|
|
|
|
|
|
|
|||||||||||||
and using (7.32), we …nd |
|
|
|
|
|
|
|
|
|||||||||||||||
|
b |
b |
e |
|
e e |
|
|
|
|
|
|
|
|
|
|
||||||||
|
= bR |
V R |
1 R |
|
= R |
V R |
|
1 R |
|
|
|
|
: |
||||||||||
Thus |
e |
|
n0 b e |
|
|
n0 |
b e |
|
|
|
n0 b e |
|
|
n0 b |
|
||||||||
|
|
|
= I V R Rn0V H 1 Rn0 : |
|
|
||||||||||||||||||
From Theorem 6.4.2 |
and Theorem 6.8.2 we …nd |
b |
f |
|
|
|
|
|
|
b |
|
|
|
|
|
||||||||
e |
|
|
|
b |
e |
|
|
|
1 |
|
|
b |
|
|
|||||||||
|
|
|
e |
|
d |
|
b e |
b e |
|
|
|
|
|
||||||||||
|
p |
n |
= |
I V R Rn0V R |
|
1 |
Rn0 p |
n |
|
|
|||||||||||||
|
|
|
|
|
! I V R R0V R |
|
|
|
R0 N (0; V ) |
=N 0; V :
CHAPTER 7. RESTRICTED ESTIMATION |
146 |
Exercises
Exercise 7.1 In the model y = X1 1 + X2 2 + e; show directly from de…nition (7.3) that the CLS estimate of = ( 1; 2) subject to the constraint that 2 = 0 is the OLS regression of y on
X1:
Exercise 7.2 In the model y = X1 1 + X2 2 + e; show directly from de…nition (7.3) that the CLS estimate of = ( 1; 2); subject to the constraint that 1 = c (where c is some given vector) is the OLS regression of y X1c on X2:
Exercise 7.3 In the model y = X1 1 + X2 2 + e; with X1 and X2 each n k; …nd the CLS estimate of = ( 1; 2); subject to the constraint that 1 = 2:
Exercise 7.4 Verify that for de…ned in (7.8) that R0 |
= c: |
|
|
|
||
Exercise 7.5 Verify (7.14). |
e |
e |
|
|
|
|
Exercise 7.6 |
Verify that the minimum distance estimator with W |
|
= Q 1 |
equals the CLS |
||
estimator. |
|
|
e |
n |
bxx |
|
Exercise 7.7 |
Prove Theorem 7.6.1. |
|
|
|
|
|
Exercise 7.8 |
Prove Theorem 7.6.2. |
|
|
|
|
Exercise 7.9 Prove Theorem 7.6.3. (Hint: Use that CLS is a special case of Theorem 7.6.2.)
Exercise 7.10 Verify that (7.18) is V (W ) with W = V 1:
Exercise 7.11 Prove (7.19). Hint: Use (7.18).
Exercise 7.12 Verify (7.21), (7.22) and (7.23)
Exercise 7.13 Verify (7.24), (7.25), and (7.26).
CHAPTER 8. TESTING |
148 |
An equivalent statement of a Neyman-Pearson test is to reject at the % level if and only if pn < : Signi…cance tests can be deduced directly from the p-value since for any ; pn < if and only if jtnj > z =2: The p-value is more general, however, in that the reader is allowed to pick the level of signi…cance , in contrast to Neyman-Pearson rejection/acceptance reporting where the researcher picks the signi…cance level. (However, the Neyman-Pearson approach requires the reader to select the signi…cance level before observing the p-value.)
Another helpful observation is that the p-value function is a unit-free transformation of the
d
t statistic. That is, under H0; pn ! U[0; 1]; so the “unusualness” of the test statistic can be compared to the easy-to-understand uniform distribution, regardless of the complication of the distribution of the original test statistic. To see this fact, note that the asymptotic distribution of jtnj is F (x) = 1 p(x): Thus
Pr (1 pn u) = Pr (1 p(tn) u)
=Pr (F (tn) u)
=Pr jtnj F 1(u)
! F F 1(u) = u;
d d
establishing that 1 pn ! U[0; 1]; from which it follows that pn ! U[0; 1]:
8.2t-ratios
Some applied papers (especially older ones) report “t-ratios”for each estimated coe¢ cient. For a coe¢ cient these are
|
^ |
|
tn = tn(0) = |
|
; |
^ |
||
|
s( ) |
|
the ratio of the coe¢ cient estimate to its standard error, and equal the t-statistic for the test of the hypothesis H0 : = 0: Such papers often discuss the “signi…cance” of certain variables or coe¢ cients, or describe “which regressors have a signi…cant e¤ect on y” by noting which t-ratios exceed 2 in absolute value.
This is very poor econometric practice, and should be studiously avoided. It is a receipe for banishment of your work to lower tier economics journals.
Fundamentally, the common t-ratio is a test for the hypothesis that a coe¢ cient equals zero. This should be reported and discussed when this is an interesting economic hypothesis of interest. But if this is not the case, it is distracting.
Instead, when a coe¢ cient is of interest, it is constructive to focus on the point estimate, its standard error, and its con…dence interval. The point estimate gives our “best guess” for the value. The standard error is a measure of precision. The con…dence interval gives us the range of values consistent with the data. If the standard error is large then the point estimate is not a good summary about : The endpoints of the con…dence interval describe the bounds on the likely possibilities. If the con…dence interval embraces too broad a set of values for ; then the dataset is not su¢ ciently informative to render inferences about : On the other hand if the con…dence interval is tight, then the data have produced an accurate estimate, and the focus should be on the value and interpretation of this estimate. In contrast, the widely-seen statement “the t-ratio is highly signi…cant”has little interpretive value.
The above discussion requires that the researcher knows what the coe¢ cient means (in terms of the economic problem) and can interpret values and magnitudes, not just signs. This is critical for good applied econometric practice.
CHAPTER 8. TESTING |
149 |
8.3Wald Tests
Sometimes = h( ) is a q 1 vector, and it is desired to test the joint restrictions simultaneously. We have the null and alternative
H0 |
: |
= 0 |
H1 |
: |
6= 0: |
A commonly used test of H0 against H1 is the Wald statistic (6.34) evaluated at the null hypothesis
|
|
Wn |
= n 0 |
0 |
V |
1 0 |
: |
(8.1) |
||||||
|
|
|
b |
|
|
|
|
b |
b |
|
|
|
|
|
Typically, we have = h( ) with asymptotic covariance matrix estimate |
|
|
||||||||||||
|
b |
b |
V |
= H0 |
|
V |
H |
|
|
|
|
|||
where |
|
b |
|
c b c |
|
|
|
|
||||||
|
|
|
H |
|
= |
@ |
|
h( ): |
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
||||
Then |
|
c |
|
@ |
b |
|
|
|
|
|||||
|
Wn = n h( ) 0 0 H0 |
V H 1 h( ) 0 : |
|
|
||||||||||
|
|
|
|
|
then the Wald statistic simpli…es to |
|||||||||
When h is a linear function of ; hb( ) = R0 ;c b c |
|
b |
|
|
||||||||||
|
|
Wn = n R0 0 |
0 R0V |
R 1 R0 0 : |
|
|
||||||||
|
|
|
b |
|
|
|
|
b |
2 |
|
b |
|
|
|
As shown in Theorem 6.14.2, when = 0 then Wn ! q |
; a chi-square random variable with |
|||||||||||||
q degrees of freedom. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
Theorem 8.3.1 Under |
Assumption |
1.5.1, |
|
Assumption |
6.4.1, |
|
|||||||
|
|
|
|
|
d |
|
|
|
|
|
|
|
|
|
|
rank(H ) = q; and H0; then Wn ! q2. |
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
An asymptotic Wald test rejects H0 in favor of H1 if Wn exceeds 2q( ); the upperquantile
of the 2q distribution. For example, 21(:05) = 3:84 = z:2025: The Wald test fails to reject if Wn is less than 2q( ): As with t-tests, it is conventional to describe a Wald test as “signi…cant” if Wn
exceeds the 5% critical value.
Notice that the asymptotic distribution in Theorem 8.3.1 depends solely on q –the number of restrictions being tested. It does not depend on k –the number of parameters estimated.
The asymptotic p-value for Wn is pn = p(Wn); where p(x) = Pr |
|
q2 x |
is the tail probability |
||||
2 |
level if and only if p |
|
< ; and |
||||
function of the q distribution. The Wald test rejects at the % |
n |
||||||
|
|
|
|
|
pn is asymptotically U[0; 1] under H0: In applied work it is good practice to report the p-value of a Wald statistic, as it helps readers intrepret the magnitude of the statistic.
8.4Minimum Distance Tests
b
The Wald test (8.1) measures the distance between the unrestricted estimate and the null
b
hypothesis 0. A minimum distance test measures the distance between and the restricted
e
estimate of the previous chapter. Recall that under the restriction
h( ) = 0
CHAPTER 8. |
TESTING |
|
|
|
|
|
|
|
|
|
150 |
the e¢ cient minimum distance estimate solves the minimization problem |
|||||||||||
|
= argmin J |
|
|
( ) |
|
|
|||||
where |
e |
h( )= 0 |
n |
|
|
|
|
|
|||
|
0 V 1 |
: |
|||||||||
|
Jn ( ) = n |
||||||||||
The minimum distance test statistic of H0 |
against |
Hb1 |
is |
b |
|
|
|||||
b |
|
|
|
|
|
||||||
|
|
) = |
min |
|
|
J |
|
( ) |
|
||
or more simply |
Jn = Jn(e |
h( )= 0 |
|
n |
|
: |
|||||
|
Jn = n 0 |
V 1 |
|||||||||
|
b |
e |
b |
|
|
|
b |
|
e |
|
An asymptotic test rejects H0 in favor of H1 if Jn exceeds 2q( ); the upperquantile of the 2q distribution. Otherwise the test does not reject H0:
When h( ) is linear it turns out that Jn = Wn; so the Wald and minimum distance tests are equal. When h( ) is non-linear then the two tests are di¤erent.
The chi-square critical value is justi…ed by the following theorm.
Theorem 8.4.1 UnderAssumption 1.5.1, Assumption 6.4.1, rank(H ) =
d 2
q; and H0; then Jn ! q.
8.5F Tests
Take the linear model
y = X1 1 + X2 2 + e
where X1 is n k1; X2 is n k2; k = k1 + k2; and the null hypothesis is
|
H0 : 2 = 0: |
|
|
|
|
|
|
|||
In this case, = 2; and there are q = k2 restrictions. Also h( ) = R0 is linear with R = |
0 |
|
||||||||
I |
||||||||||
a selector matrix. We know that the Wald statistic takes the form |
|
|
||||||||
Wn = n 0V 1 |
|
|
|
|
|
|
|
|||
= |
n 0 |
|
R V |
R |
|
1 |
|
: |
|
|
b2b |
0b |
|
|
|
2 |
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Now suppose that covariance matrix is computed under the assumption of homoskedasticity, so |
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
b |
|
|
b |
|
|
|
|
|
b |
|
|
that V |
|
is replaced with V |
0 |
= s2 |
|
n |
|
1X |
0X |
1 |
: We de…ne the “homoskedastic”Wald statistic |
|||||||||
b |
|
b |
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
||
|
|
Wn0 = n 0 V 0 |
|
|
|
|
1 |
|
||||||||||||
|
|
|
|
|
|
|
|
= |
0 |
|
R |
V |
0 R |
|
: |
|||||
|
|
|
|
|
|
|
|
nb |
|
b 0 |
|
|
b |
|
|
2 |
||||
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
What we show in this section is that this Wald statistic |
can be written very simply using the |
|||
b |
b |
b |
|
|
formula |
e~0e~ e^0e^ |
|
|
|
Wn0 = (n k) |
|
|
||
|
|
(8.2) |
||
e^0e^ |
CHAPTER 8. TESTING |
|
|
|
|
|
|
151 |
where |
e |
e |
= X10 X1 |
|
1 |
|
|
e~ = |
X10 y |
||||||
are from OLS of y on X1; and |
y X1 1; |
1 |
|
|
|||
e^ = y X ; |
= X0X 1 |
X0y |
|||||
|
b |
b |
|
|
|
|
|
are from OLS of y on X = (X1; X2):
The elegant feature about (8.2) is that it is directly computable from the standard output from two simple OLS regressions, as the sum of squared errors is a typical output from statistical packages. This statistic is typically reported as an “F-statistic”which is de…ned as
|
|
|
|
|
|
|
|
|
Wn0 |
|
|
|
|
|
|
e~0e~ e^0e^ =k2 |
|
|
|
|
|
|
||||||||||||||||||||||
|
|
|
|
|
|
|
Fn = |
|
|
|
|
|
= |
e^0e^=(n |
|
|
k) |
|
|
: |
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
|
|
|
k2 |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
While it should be emphasized that equality (8.2) only holds if V 0 |
= s2 |
n 1X0X 1 ; still this |
||||||||||||||||||||||||||||||||||||||||||
formula often …nds good use in reading applied papers. Because of this |
connection we call (8.2) the |
|||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||
F form of the Wald statistic. (We can also call Wn0 a homoskedasticb |
form of the Wald statistic.) |
|||||||||||||||||||||||||||||||||||||||||||
We now derive expression (8.2). First, note that by partitioned matrix inversion (A.4) |
||||||||||||||||||||||||||||||||||||||||||||
|
|
R0 X0X |
1 |
R = R0 |
X10 X1 X10 X2 |
|
|
1 |
R = X20 M1X2 |
1 |
|
|||||||||||||||||||||||||||||||||
|
|
|
X20 X1 X20 X2 |
|
|
|
|
|
|
|
||||||||||||||||||||||||||||||||||
where M1 |
= I |
|
X1(X0 |
X1) 1X0 : Thus |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
1 |
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
and |
|
|
b |
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|||||||||
|
|
R0V 0 R |
|
= s 2n 1 R0 |
|
|
|
X0X |
1 R |
|
|
|
= s 2n 1 X20 M1X2 |
|
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
Wn0 = n 20 R0V 0 R |
1 |
2 |
|
|
|
|
|
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
= |
|
|
b |
0 |
|
|
|
|
|
|
s2 |
1 |
|
|
2 |
|
|
|
b |
: |
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
b2 (X20 |
|
|
b |
|
|
) |
|
2b |
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M |
|
X |
|
|
|
|
|
|
|
|
||||||||||||
To simplify this expression further, note that |
|
if we regress y on X1 alone, the residual is |
||||||||||||||||||||||||||||||||||||||||||
e~ = M1y: Now consider the residual regression of e~ on X2 = M1X2: |
By the FWL theorem, |
|||||||||||||||||||||||||||||||||||||||||||
e~ = X2 2 |
+ e^ and X20 e^ = 0: Thus |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
f |
|
|
|
|
|
|
|||||||||||
f b |
|
|
f |
|
|
|
= f2 |
b2 |
|
|
2 |
|
2 |
0 |
|
|
f0 |
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
e~0e~ = |
X2 2 + e^ X |
2 2 + e^ |
|
|
|
|
|
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
= |
|
0 X |
0 |
X + e^ e^ |
|
|
|
0 |
|
|
|
|
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
b2f20 |
f |
1b |
|
|
2 |
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
0 |
X M X |
|
|
|
+ e^ e^; |
|
|
|
|
|
|
|||||||||||||||||||||||||
or alternatively, |
|
|
|
|
0 X |
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
0 M |
|
|
|
X |
|
|
= e~0e~ |
|
|
|
|
|
e^0e^: |
|
|
|
|
|
|||||||||||||||||||
Also, since |
|
|
|
|
|
|
b2 |
2 |
|
|
1 |
|
|
|
|
2 b |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
we conclude that |
|
|
|
|
|
s2 = (n k) 1 e^0e^ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
e~0e~ e^0e^ |
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
Wn0 = (n k) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
e^0e^ |
|
|
|
|
|
|
|
|
|
|
|
|
as claimed.
In many statistical packages, when an OLS regression is estimated, an “F-statistic”is reported. This is Fn when X1 is a vector of ones, so H0 is an intercept-only model. This special F statistic is
CHAPTER 8. TESTING |
152 |
testing the hypothesis that all slope coe¢ cients (all coe¢ cients other than the intercept) are zero. This was a popular statistic in the early days of econometric reporting, when sample sizes were very small and researchers wanted to know if there was “any explanatory power” to their regression. This is rarely an issue today, as sample sizes are typically su¢ ciently large that this F statistic is nearly always highly signi…cant. While there are special cases where this F statistic is useful, these cases are atypical. As a general rule, there is no reason to report this F statistic.
8.6Normal Regression Model
Now let us partition = ( 1; 2) and consider tests of the linear restriction
H0 |
: |
2 |
= 0 |
H1 |
: |
2 |
6= 0 |
in the normal regression model. In parametric models, a good test statistic is the likelihood ratio, which is twice the di¤erence in the log-likelihood function evaluated under the null and alternative
hypotheses. The estimator under the alternative is the unrestricted estimator ( |
; |
; ^2) discussed |
|||||||||||||||||
above. The Gaussian log-likelihood at these estimates is |
|
|
|
|
|
|
|
|
b |
1 |
b2 |
|
|||||||
log L( 1; 2; ^2) |
|
n |
|
|
|
2 |
|
1 |
|
e^0e^ |
|
|
|
|
|
||||
= |
|
|
log |
2 ^ |
|
|
|
|
|
|
|
||||||||
|
2 |
|
|
2 |
|
|
|
|
|
||||||||||
b b |
|
n |
|
2 |
|
|
n |
2^ |
|
|
n |
|
|
|
|
||||
= |
|
|
log |
^ |
|
|
|
|
log (2 ) |
|
: |
|
|
|
|||||
2 |
|
2 |
2 |
|
|
|
The MLE under the null hypothesis is the restricted estimates ( 1; 0; ~2) where 1 is the OLS
estimate from a regression of y |
|
on |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
variance ~ |
2: The log-likelihood of this |
|||||||
model is |
|
|
i |
|
|
x1i only, with residual |
|
|
e |
|
|
e |
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
n |
|
|
|
n |
||||||
log L( 1; 0; ~2) = |
|
|
log ~2 |
|
|
log (2 ) |
|
|
: |
|||||||||||||||||
2 |
2 |
2 |
||||||||||||||||||||||||
The LR statistic for H0 |
against |
H1 |
is |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
e |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
LR |
|
= |
|
2 |
|
|
|
|
|
; |
; ^2) |
2 |
log L( |
; 0; ~2) |
||||||||||||
|
n |
|
|
|
|
log L(2 |
1 |
2 |
|
|
|
|
1 |
|
|
|
||||||||||
|
|
= |
|
|
|
|
~2 |
|
|
|
|
|
|
e |
|
|
|
|
||||||||
|
|
|
n log |
|
~ b |
|
blog ^ |
|
|
|
|
|
|
|
|
|
||||||||||
|
|
= n log |
|
|
: |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
^2 |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
By a …rst-order Taylor series approximation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
~2 |
|
|
|
|
|
|
~2 |
|
|
|
|
|||||
LRn = n log 1 + |
|
1 ' n |
|
|
1 = Wn0: |
|||||||||||||||||||||
^2 |
^2 |
the homoskedastic Wald statistic. This shows that the two statistics (LRn and Wn0) can be numerically close. It also shows that the homoskedastic Wald statistic for linear hypotheses can also be interpreted as an appropriate likelihood ratio statistic under normality.
8.7Problems with Tests of NonLinear Hypotheses
While the t and Wald tests work well when the hypothesis is a linear restriction on ; they can work quite poorly when the restrictions are nonlinear. This can be seen by a simple example introduced by Lafontaine and White (1986). Take the model
yi = + ei ei N(0; 2)