Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 7. RESTRICTED ESTIMATION

143

However, in the general case of conditional heteroskedasticity this ranking is not guaranteed, in fact what is really amazing is that the variance ranking can be reversed. The CLS estimator can have a larger asymptotic variance than the unconstrained least squares estimator.

To see this let’s use the simple heteroskedastic example from Section 6.5. In that example,

1 7 3

Q11 = Q22 = 1; Q12 = 2; 11 = 22 = 1; and 12 = 8: We can calculate that Q11 2 = 4 and

 

 

 

 

1;b

 

2

(7.24)

 

 

 

avar( 1)

=

 

 

 

 

 

 

e

 

 

3

 

 

 

 

 

 

5

 

 

(7.25)

 

 

 

avar( cls)

=

1

 

 

 

avar( 1;md) =

 

:

(7.26)

 

 

 

8

 

 

 

 

 

 

 

 

 

Thus the restricted least-squares estimator e1 has a larger variance than the unrestricted least-

squares estimator

 

! The minimum distance estimator has the smallest variance of the three, as

expected.

1

 

e

 

 

 

 

 

 

What we have foundb

is that when the estimation method is least-squares, deleting the irrelevant

variable x2i can actually decrease the precision of estimation of 1; or equivalently, adding the irrelevant variable x2i can actually improve the precision of the estimation.

To repeat this unexpected …nding, we have shown in a very simple example that it is possible for least-squares applied to the short regression (7.10) to be less e¢ cient for estimation of 1 than least-squares applied to the long regression (7.9), even though the constraint 2 = 0 is valid! This result is strongly counter-intuitive. It seems to contradict our initial motivation for pursuing constrained estimation –to improve estimation e¢ ciency.

It turns out that a more re…ned answer is appropriate. Constrained estimation is desirable, but not constrained least-squares estimation. While least-squares is asymptotically e¢ cient for estimation of the unconstrained projection model, it is not an e¢ cient estimator of the constrained projection model.

7.9Variance and Standard Error Estimation

The asymptotic covariance matrix (7.18) may be estimated by replacing V with a consistent

b

estimates such as V . This variance estimator is then

V

 

= V

V

R

R0V

R 1

R0V

:

(7.27)

b

 

b

b

 

b

 

b

 

 

0e

We can calculate standard errors for any linear combination h so long as h does not lie in

0e

the range space of R. A standard error for h is

s(h0 ) =

n 1h0V

h

1=2

:

e

b

 

 

7.10Nonlinear Constraints

In some cases it is desirable to impose nonlinear constraints on the parameter vector . They can be written as

r( ) = 0

(7.28)

where r : Rk ! Rq: This includes the linear constraints (7.1) as a special case. An example of (7.28) which cannot be written as (7.1) is 1 2 = 1; or r( ) = 1 2 1:

The minimum distance estimator of subject to (7.28) solves the minimization problem

= argmin J

 

( )

(7.29)

e

r( )=0

n

 

 

CHAPTER 7. RESTRICTED ESTIMATION

 

 

 

 

144

where

0

 

1

 

 

Jn ( ) = n

V

:

The solution minimizes the Lagrangian

b

 

b

 

b

 

 

1

 

( ) + 0r( )

(7.30)

L( ; ) =

 

Jn

2

over ( ; ):

e

Computationally, there is no explicit expression for the solution so it must be found numerically. Computational methods are based on the method of quadratic programming and are not reviewed here.

Assumption 7.10.1 r( ) = 0 with rank(R) = q; where R = @@ r( )0:

The asymptotic distribution is a simple generalization of the case of a linear constraint, but the proof is more delicate.

Theorem 7.10.1 Under Assumption 1.5.1, Assumption 6.4.1, and Assumption

e

7.10.1, for de…ned in (7.29) ,

 

 

 

 

 

 

 

 

 

 

d

 

 

 

 

 

 

 

 

 

 

 

 

pn

 

 

 

 

 

 

 

 

 

 

 

 

 

! N 0; V

 

 

 

 

 

 

as n

! 1

; where

 

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

 

V = V V R R0V R 1 R0V

 

 

 

 

The asymptotic variance matrix can be estimated by

 

 

 

 

 

 

 

 

 

 

V

= V

V

R R0V

R 1

R0V

 

 

 

 

 

where

 

 

b

b

b

b

@

b b

b

b b

 

 

 

 

 

 

 

 

 

 

 

 

R =

 

r( )0:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e

b

@

e

 

 

1

 

b

 

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Standard errors for the elements of are the square roots of the diagonal elements of n

V

 

7.11Technical Proofs*

Proof of Theorem 7.7.1, Equation (7.20). Let R? be a full rank k (k q) matrix satisfying

R0?V R = 0 and then set C = [R; R?] which is full rank and invertible. Then we can calculate that

 

R0V

R R0V

R

?

 

C0V C =

R?0 V

 

 

 

R R?0 V

R?

 

 

 

 

 

 

=

0

0

 

 

 

0 R?0 V R?

 

 

 

 

 

 

 

CHAPTER 7.

RESTRICTED ESTIMATION

 

 

 

 

145

and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

C0V (W )C =

R0V

(W )R R0V

(W )R

?

 

 

 

 

 

 

 

 

 

 

 

R?0 V (W )R R?0 V (W )R?

 

 

 

 

 

=

 

0

 

 

 

 

 

 

 

 

 

0

:

Thus

 

 

 

0 R?0 V R? + R?0 W R (R0W R) 1 R0V R (R0W R) 1 R0W R?

 

 

 

 

 

 

 

= C00

 

 

 

 

 

 

 

 

 

C0

 

(W )

V

C

 

(W )C

0

V

 

C

0

 

 

V

 

 

 

 

V

 

 

C

 

 

 

 

 

 

 

 

 

 

 

=

0 R?0 W R (R0W R) 1 R0V R (R0W R) 1 R0W R? 0

Since C is invertible it follows that V (W ) V 0 which is (7.20).

Proof of Theorem 7.10.1. For simplicity, we assume that the constrained estimator is consistent

p

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

! . This can be shown with more e¤ort, but requires a deeper treatment than appropriate

for this textbook.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e For each element rj( ) of the q-vector r( ); by the mean value theorem there exists a j on

 

 

 

 

and such that

 

 

 

 

 

 

 

 

 

 

 

the line segment joining e

r

( ) = r

( ) +

@

r

( )0

 

 

 

 

:

(7.31)

 

 

 

Let R

be the k

 

q matrix

j

e

j

 

 

 

@ j

 

j e

 

 

 

 

 

 

n

 

 

Rn =

 

 

 

 

 

 

 

 

 

 

 

 

 

 

rq( q) :

 

 

 

 

 

@

r1( 1)

@

r2( 2)

 

 

@

 

 

 

 

 

@

@

 

@

 

Since

p

 

 

 

 

 

p

 

 

 

 

 

 

 

 

 

 

p

 

R: Stacking the (7.31), we obtain

 

it follows that j ! , and by the CMT, Rn

 

 

 

e

!

 

 

 

 

 

r( ) = r( ) + R0

 

!:

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

n

e

 

 

 

 

 

 

e

Since r( ) = 0 by construction and r( ) = 0 by Assumption 7.6.1, this implies

 

 

 

 

 

 

 

0 = R0

 

 

 

 

 

 

 

 

 

 

 

 

 

(7.32)

 

 

 

 

 

 

 

 

 

 

:

 

 

 

 

 

 

 

 

 

The …rst-order condition for (7.30) is

n

e

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Premultiplying by R0V ; inverting,

V 1 = R :

 

 

 

 

 

 

 

 

 

and using (7.32), we …nd

 

 

 

 

 

 

 

 

 

b

b

e

 

e e

 

 

 

 

 

 

 

 

 

 

 

= bR

V R

1 R

 

= R

V R

 

1 R

 

 

 

 

:

Thus

e

 

n0 b e

 

 

n0

b e

 

 

 

n0 b e

 

 

n0 b

 

 

 

 

= I V R Rn0V H 1 Rn0 :

 

 

From Theorem 6.4.2

and Theorem 6.8.2 we …nd

b

f

 

 

 

 

 

 

b

 

 

 

 

 

e

 

 

 

b

e

 

 

 

1

 

 

b

 

 

 

 

 

e

 

d

 

b e

b e

 

 

 

 

 

 

p

n

=

I V R Rn0V R

 

1

Rn0 p

n

 

 

 

 

 

 

 

! I V R R0V R

 

 

 

R0 N (0; V )

=N 0; V :

CHAPTER 7. RESTRICTED ESTIMATION

146

Exercises

Exercise 7.1 In the model y = X1 1 + X2 2 + e; show directly from de…nition (7.3) that the CLS estimate of = ( 1; 2) subject to the constraint that 2 = 0 is the OLS regression of y on

X1:

Exercise 7.2 In the model y = X1 1 + X2 2 + e; show directly from de…nition (7.3) that the CLS estimate of = ( 1; 2); subject to the constraint that 1 = c (where c is some given vector) is the OLS regression of y X1c on X2:

Exercise 7.3 In the model y = X1 1 + X2 2 + e; with X1 and X2 each n k; …nd the CLS estimate of = ( 1; 2); subject to the constraint that 1 = 2:

Exercise 7.4 Verify that for de…ned in (7.8) that R0

= c:

 

 

 

Exercise 7.5 Verify (7.14).

e

e

 

 

 

Exercise 7.6

Verify that the minimum distance estimator with W

 

= Q 1

equals the CLS

estimator.

 

 

e

n

bxx

 

Exercise 7.7

Prove Theorem 7.6.1.

 

 

 

 

Exercise 7.8

Prove Theorem 7.6.2.

 

 

 

 

Exercise 7.9 Prove Theorem 7.6.3. (Hint: Use that CLS is a special case of Theorem 7.6.2.)

Exercise 7.10 Verify that (7.18) is V (W ) with W = V 1:

Exercise 7.11 Prove (7.19). Hint: Use (7.18).

Exercise 7.12 Verify (7.21), (7.22) and (7.23)

Exercise 7.13 Verify (7.24), (7.25), and (7.26).

Chapter 8

Testing

8.1t tests

The t-test is routinely used to test hypotheses on . A simple null and composite hypothesis takes the form

H0

:

= 0

H1

:

6= 0

where 0 is some pre-speci…ed value. A t-test rejects H0 in favor of H1 when jtn( 0)j is large. By “large”we mean that the observed value of the t-statistic would be unlikely if H0 were true.

Formally, we …rst pick an asymptotic signi…cance level . We then …nd z =2; the upper =2 quantile of the standard normal distribution which has the property that if Z N(0; 1) then

Pr jZj > z =2 = :

For example, z:025 = 1:96 and z:05 = 1:645: A test of asymptotic signi…cance rejects H0 if jtnj > z =2: Otherwise the test does not reject, or “accepts”H0:

The asymptotic signi…cance level is because Theorem 6.11.1 implies that

Pr (reject

H

0

j H

0 true) =

Pr

tn

> z =2

j

= 0

 

 

 

!

Pr

jZ j

> z =2

 

= :

 

 

 

 

 

j j

 

 

 

 

The rejection/acceptance dichotomy is associated with the Neyman-Pearson approach to hypothesis testing.

While there is no objective scienti…c basis for choice of signi…cance level ; the common practice is to set = :05 or 5%. This implies a critical value of z:025 = 1:96 2. When jtnj > 2 it is common to say that the t-statistic is statistically signi…cant. and if jtnj < 2 it is common to say that the t-statistic is statistically insigni…cant. It is helpful to remember that this is simply a way of saying “Using a t-test, the hypothesis that = 0 can [cannot] be rejected at the asymptotic 5% level.”

A related statistic is the asymptotic p-value, which can be interpreted as a measure of the evidence against the null hypothesis. The asymptotic p-value of the statistic tn is

pn = p(tn)

where p(t) is the tail probability function

p(t) = Pr (jZj > jtj) = 2 (1 (jtj)) :

If the p-value pn is small (close to zero) then the evidence against H0 is strong.

147

CHAPTER 8. TESTING

148

An equivalent statement of a Neyman-Pearson test is to reject at the % level if and only if pn < : Signi…cance tests can be deduced directly from the p-value since for any ; pn < if and only if jtnj > z =2: The p-value is more general, however, in that the reader is allowed to pick the level of signi…cance , in contrast to Neyman-Pearson rejection/acceptance reporting where the researcher picks the signi…cance level. (However, the Neyman-Pearson approach requires the reader to select the signi…cance level before observing the p-value.)

Another helpful observation is that the p-value function is a unit-free transformation of the

d

t statistic. That is, under H0; pn ! U[0; 1]; so the “unusualness” of the test statistic can be compared to the easy-to-understand uniform distribution, regardless of the complication of the distribution of the original test statistic. To see this fact, note that the asymptotic distribution of jtnj is F (x) = 1 p(x): Thus

Pr (1 pn u) = Pr (1 p(tn) u)

=Pr (F (tn) u)

=Pr jtnj F 1(u)

! F F 1(u) = u;

d d

establishing that 1 pn ! U[0; 1]; from which it follows that pn ! U[0; 1]:

8.2t-ratios

Some applied papers (especially older ones) report “t-ratios”for each estimated coe¢ cient. For a coe¢ cient these are

 

^

 

tn = tn(0) =

 

;

^

 

s( )

 

the ratio of the coe¢ cient estimate to its standard error, and equal the t-statistic for the test of the hypothesis H0 : = 0: Such papers often discuss the “signi…cance” of certain variables or coe¢ cients, or describe “which regressors have a signi…cant e¤ect on y” by noting which t-ratios exceed 2 in absolute value.

This is very poor econometric practice, and should be studiously avoided. It is a receipe for banishment of your work to lower tier economics journals.

Fundamentally, the common t-ratio is a test for the hypothesis that a coe¢ cient equals zero. This should be reported and discussed when this is an interesting economic hypothesis of interest. But if this is not the case, it is distracting.

Instead, when a coe¢ cient is of interest, it is constructive to focus on the point estimate, its standard error, and its con…dence interval. The point estimate gives our “best guess” for the value. The standard error is a measure of precision. The con…dence interval gives us the range of values consistent with the data. If the standard error is large then the point estimate is not a good summary about : The endpoints of the con…dence interval describe the bounds on the likely possibilities. If the con…dence interval embraces too broad a set of values for ; then the dataset is not su¢ ciently informative to render inferences about : On the other hand if the con…dence interval is tight, then the data have produced an accurate estimate, and the focus should be on the value and interpretation of this estimate. In contrast, the widely-seen statement “the t-ratio is highly signi…cant”has little interpretive value.

The above discussion requires that the researcher knows what the coe¢ cient means (in terms of the economic problem) and can interpret values and magnitudes, not just signs. This is critical for good applied econometric practice.

CHAPTER 8. TESTING

149

8.3Wald Tests

Sometimes = h( ) is a q 1 vector, and it is desired to test the joint restrictions simultaneously. We have the null and alternative

H0

:

= 0

H1

:

6= 0:

A commonly used test of H0 against H1 is the Wald statistic (6.34) evaluated at the null hypothesis

 

 

Wn

= n 0

0

V

1 0

:

(8.1)

 

 

 

b

 

 

 

 

b

b

 

 

 

 

Typically, we have = h( ) with asymptotic covariance matrix estimate

 

 

 

b

b

V

= H0

 

V

H

 

 

 

 

where

 

b

 

c b c

 

 

 

 

 

 

 

H

 

=

@

 

h( ):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Then

 

c

 

@

b

 

 

 

 

 

Wn = n h( ) 0 0 H0

V H 1 h( ) 0 :

 

 

 

 

 

 

 

then the Wald statistic simpli…es to

When h is a linear function of ; hb( ) = R0 ;c b c

 

b

 

 

 

 

Wn = n R0 0

0 R0V

R 1 R0 0 :

 

 

 

 

 

b

 

 

 

 

b

2

 

b

 

 

As shown in Theorem 6.14.2, when = 0 then Wn ! q

; a chi-square random variable with

q degrees of freedom.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Theorem 8.3.1 Under

Assumption

1.5.1,

 

Assumption

6.4.1,

 

 

 

 

 

 

d

 

 

 

 

 

 

 

 

 

 

rank(H ) = q; and H0; then Wn ! q2.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

An asymptotic Wald test rejects H0 in favor of H1 if Wn exceeds 2q( ); the upperquantile

of the 2q distribution. For example, 21(:05) = 3:84 = z:2025: The Wald test fails to reject if Wn is less than 2q( ): As with t-tests, it is conventional to describe a Wald test as “signi…cant” if Wn

exceeds the 5% critical value.

Notice that the asymptotic distribution in Theorem 8.3.1 depends solely on q –the number of restrictions being tested. It does not depend on k –the number of parameters estimated.

The asymptotic p-value for Wn is pn = p(Wn); where p(x) = Pr

 

q2 x

is the tail probability

2

level if and only if p

 

< ; and

function of the q distribution. The Wald test rejects at the %

n

 

 

 

 

 

pn is asymptotically U[0; 1] under H0: In applied work it is good practice to report the p-value of a Wald statistic, as it helps readers intrepret the magnitude of the statistic.

8.4Minimum Distance Tests

b

The Wald test (8.1) measures the distance between the unrestricted estimate and the null

b

hypothesis 0. A minimum distance test measures the distance between and the restricted

e

estimate of the previous chapter. Recall that under the restriction

h( ) = 0

CHAPTER 8.

TESTING

 

 

 

 

 

 

 

 

 

150

the e¢ cient minimum distance estimate solves the minimization problem

 

= argmin J

 

 

( )

 

 

where

e

h( )= 0

n

 

 

 

 

 

 

0 V 1

:

 

Jn ( ) = n

The minimum distance test statistic of H0

against

Hb1

is

b

 

 

b

 

 

 

 

 

 

 

) =

min

 

 

J

 

( )

 

or more simply

Jn = Jn(e

h( )= 0

 

n

 

:

 

Jn = n 0

V 1

 

b

e

b

 

 

 

b

 

e

 

An asymptotic test rejects H0 in favor of H1 if Jn exceeds 2q( ); the upperquantile of the 2q distribution. Otherwise the test does not reject H0:

When h( ) is linear it turns out that Jn = Wn; so the Wald and minimum distance tests are equal. When h( ) is non-linear then the two tests are di¤erent.

The chi-square critical value is justi…ed by the following theorm.

Theorem 8.4.1 UnderAssumption 1.5.1, Assumption 6.4.1, rank(H ) =

d 2

q; and H0; then Jn ! q.

8.5F Tests

Take the linear model

y = X1 1 + X2 2 + e

where X1 is n k1; X2 is n k2; k = k1 + k2; and the null hypothesis is

 

H0 : 2 = 0:

 

 

 

 

 

 

In this case, = 2; and there are q = k2 restrictions. Also h( ) = R0 is linear with R =

0

 

I

a selector matrix. We know that the Wald statistic takes the form

 

 

Wn = n 0V 1

 

 

 

 

 

 

 

=

n 0

 

R V

R

 

1

 

:

 

 

b2b

0b

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

Now suppose that covariance matrix is computed under the assumption of homoskedasticity, so

 

 

 

 

 

 

 

 

 

b

 

 

b

 

 

 

 

 

b

 

that V

 

is replaced with V

0

= s2

 

n

 

1X

0X

1

: We de…ne the “homoskedastic”Wald statistic

b

 

b

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

Wn0 = n 0 V 0

 

 

 

 

1

 

 

 

 

 

 

 

 

 

=

0

 

R

V

0 R

 

:

 

 

 

 

 

 

 

 

nb

 

b 0

 

 

b

 

 

2

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

What we show in this section is that this Wald statistic

can be written very simply using the

b

b

b

 

formula

e~0e~ e^0e^

 

 

Wn0 = (n k)

 

 

 

 

(8.2)

e^0e^

CHAPTER 8. TESTING

 

 

 

 

 

 

151

where

e

e

= X10 X1

 

1

 

e~ =

X10 y

are from OLS of y on X1; and

y X1 1;

1

 

 

e^ = y X ;

= X0X 1

X0y

 

b

b

 

 

 

 

 

are from OLS of y on X = (X1; X2):

The elegant feature about (8.2) is that it is directly computable from the standard output from two simple OLS regressions, as the sum of squared errors is a typical output from statistical packages. This statistic is typically reported as an “F-statistic”which is de…ned as

 

 

 

 

 

 

 

 

 

Wn0

 

 

 

 

 

 

e~0e~ e^0e^ =k2

 

 

 

 

 

 

 

 

 

 

 

 

 

Fn =

 

 

 

 

 

=

e^0e^=(n

 

 

k)

 

 

:

 

 

 

 

 

 

 

 

 

 

 

 

 

k2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

While it should be emphasized that equality (8.2) only holds if V 0

= s2

n 1X0X 1 ; still this

formula often …nds good use in reading applied papers. Because of this

connection we call (8.2) the

 

 

 

 

F form of the Wald statistic. (We can also call Wn0 a homoskedasticb

form of the Wald statistic.)

We now derive expression (8.2). First, note that by partitioned matrix inversion (A.4)

 

 

R0 X0X

1

R = R0

X10 X1 X10 X2

 

 

1

R = X20 M1X2

1

 

 

 

 

X20 X1 X20 X2

 

 

 

 

 

 

 

where M1

= I

 

X1(X0

X1) 1X0 : Thus

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

and

 

 

b

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

R0V 0 R

 

= s 2n 1 R0

 

 

 

X0X

1 R

 

 

 

= s 2n 1 X20 M1X2

 

 

 

 

 

 

 

 

Wn0 = n 20 R0V 0 R

1

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

b

0

 

 

 

 

 

 

s2

1

 

 

2

 

 

 

b

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b2 (X20

 

 

b

 

 

)

 

2b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

M

 

X

 

 

 

 

 

 

 

 

To simplify this expression further, note that

 

if we regress y on X1 alone, the residual is

e~ = M1y: Now consider the residual regression of e~ on X2 = M1X2:

By the FWL theorem,

e~ = X2 2

+ e^ and X20 e^ = 0: Thus

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f

 

 

 

 

 

 

f b

 

 

f

 

 

 

= f2

b2

 

 

2

 

2

0

 

 

f0

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e~0e~ =

X2 2 + e^ X

2 2 + e^

 

 

 

 

 

 

 

 

 

 

 

 

=

 

0 X

0

X + e^ e^

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b2f20

f

1b

 

 

2

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

X M X

 

 

 

+ e^ e^;

 

 

 

 

 

 

or alternatively,

 

 

 

 

0 X

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 M

 

 

 

X

 

 

= e~0e~

 

 

 

 

 

e^0e^:

 

 

 

 

 

Also, since

 

 

 

 

 

 

b2

2

 

 

1

 

 

 

 

2 b

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

we conclude that

 

 

 

 

 

s2 = (n k) 1 e^0e^

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e~0e~ e^0e^

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Wn0 = (n k)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e^0e^

 

 

 

 

 

 

 

 

 

 

 

 

as claimed.

In many statistical packages, when an OLS regression is estimated, an “F-statistic”is reported. This is Fn when X1 is a vector of ones, so H0 is an intercept-only model. This special F statistic is

CHAPTER 8. TESTING

152

testing the hypothesis that all slope coe¢ cients (all coe¢ cients other than the intercept) are zero. This was a popular statistic in the early days of econometric reporting, when sample sizes were very small and researchers wanted to know if there was “any explanatory power” to their regression. This is rarely an issue today, as sample sizes are typically su¢ ciently large that this F statistic is nearly always highly signi…cant. While there are special cases where this F statistic is useful, these cases are atypical. As a general rule, there is no reason to report this F statistic.

8.6Normal Regression Model

Now let us partition = ( 1; 2) and consider tests of the linear restriction

H0

:

2

= 0

H1

:

2

6= 0

in the normal regression model. In parametric models, a good test statistic is the likelihood ratio, which is twice the di¤erence in the log-likelihood function evaluated under the null and alternative

hypotheses. The estimator under the alternative is the unrestricted estimator (

;

; ^2) discussed

above. The Gaussian log-likelihood at these estimates is

 

 

 

 

 

 

 

 

b

1

b2

 

log L( 1; 2; ^2)

 

n

 

 

 

2

 

1

 

e^0e^

 

 

 

 

 

=

 

 

log

2 ^

 

 

 

 

 

 

 

 

2

 

 

2

 

 

 

 

 

b b

 

n

 

2

 

 

n

2^

 

 

n

 

 

 

 

=

 

 

log

^

 

 

 

 

log (2 )

 

:

 

 

 

2

 

2

2

 

 

 

The MLE under the null hypothesis is the restricted estimates ( 1; 0; ~2) where 1 is the OLS

estimate from a regression of y

 

on

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

variance ~

2: The log-likelihood of this

model is

 

 

i

 

 

x1i only, with residual

 

 

e

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

n

 

 

 

n

log L( 1; 0; ~2) =

 

 

log ~2

 

 

log (2 )

 

 

:

2

2

2

The LR statistic for H0

against

H1

is

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LR

 

=

 

2

 

 

 

 

 

;

; ^2)

2

log L(

; 0; ~2)

 

n

 

 

 

 

log L(2

1

2

 

 

 

 

1

 

 

 

 

 

=

 

 

 

 

~2

 

 

 

 

 

 

e

 

 

 

 

 

 

 

n log

 

~ b

 

blog ^

 

 

 

 

 

 

 

 

 

 

 

= n log

 

 

:

 

 

 

 

 

 

 

 

 

 

 

 

 

^2

 

 

 

 

 

 

 

 

 

 

 

By a …rst-order Taylor series approximation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

~2

 

 

 

 

 

 

~2

 

 

 

 

LRn = n log 1 +

 

1 ' n

 

 

1 = Wn0:

^2

^2

the homoskedastic Wald statistic. This shows that the two statistics (LRn and Wn0) can be numerically close. It also shows that the homoskedastic Wald statistic for linear hypotheses can also be interpreted as an appropriate likelihood ratio statistic under normality.

8.7Problems with Tests of NonLinear Hypotheses

While the t and Wald tests work well when the hypothesis is a linear restriction on ; they can work quite poorly when the restrictions are nonlinear. This can be seen by a simple example introduced by Lafontaine and White (1986). Take the model

yi = + ei ei N(0; 2)

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]