Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский политехнический университет Петра Великого (бывш. СПбГПУ)

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Econometrics2011

.pdf

Скачиваний:

Добавлен:

21.03.2016

Размер:

1.77 Mб

Скачать

☆

<<< < Предыдущая 4 5 6 7 8 9 10 11 12 13 14 1516 / 3016 17 18 19 20 21 22 23 24 25 26 27 28 > Следующая >>>

CHAPTER 7. RESTRICTED ESTIMATION

143

However, in the general case of conditional heteroskedasticity this ranking is not guaranteed, in fact what is really amazing is that the variance ranking can be reversed. The CLS estimator can have a larger asymptotic variance than the unconstrained least squares estimator.

To see this let’s use the simple heteroskedastic example from Section 6.5. In that example,

1 7 3

Q11 = Q22 = 1; Q12 = 2; 11 = 22 = 1; and 12 = 8: We can calculate that Q11 2 = 4 and

				1;b		2		(7.24)
			avar( 1)		=			(7.24)
			e			3
			e			5		(7.25)
			avar( cls)		=	1		(7.25)
			avar( 1;md) =				:	(7.26)
			avar( 1;md) =			8	:	(7.26)
						8
Thus the restricted least-squares estimator e1 has a larger variance than the unrestricted least-
squares estimator		! The minimum distance estimator has the smallest variance of the three, as
expected.	1		e
What we have foundb			is that when the estimation method is least-squares, deleting the irrelevant

variable x2i can actually decrease the precision of estimation of 1; or equivalently, adding the irrelevant variable x2i can actually improve the precision of the estimation.

To repeat this unexpected …nding, we have shown in a very simple example that it is possible for least-squares applied to the short regression (7.10) to be less e¢ cient for estimation of 1 than least-squares applied to the long regression (7.9), even though the constraint 2 = 0 is valid! This result is strongly counter-intuitive. It seems to contradict our initial motivation for pursuing constrained estimation –to improve estimation e¢ ciency.

It turns out that a more re…ned answer is appropriate. Constrained estimation is desirable, but not constrained least-squares estimation. While least-squares is asymptotically e¢ cient for estimation of the unconstrained projection model, it is not an e¢ cient estimator of the constrained projection model.

7.9Variance and Standard Error Estimation

The asymptotic covariance matrix (7.18) may be estimated by replacing V with a consistent

estimates such as V . This variance estimator is then

V		= V	V	R	R0V	R 1	R0V	:	(7.27)
b		b	b		b		b

We can calculate standard errors for any linear combination h so long as h does not lie in

the range space of R. A standard error for h is

s(h0 ) =	n 1h0V	h	1=2
			:
e	b

7.10Nonlinear Constraints

In some cases it is desirable to impose nonlinear constraints on the parameter vector . They can be written as

r( ) = 0

(7.28)

where r : Rk ! Rq: This includes the linear constraints (7.1) as a special case. An example of (7.28) which cannot be written as (7.1) is 1 2 = 1; or r( ) = 1 2 1:

The minimum distance estimator of subject to (7.28) solves the minimization problem

= argmin J			( )	(7.29)
e	r( )=0	n


CHAPTER 7. RESTRICTED ESTIMATION								144
where	0					1
Jn ( ) = n	0				V	1		:
The solution minimizes the Lagrangian	b				b		b
	1			( ) + 0r( )				(7.30)
L( ; ) =			Jn
L( ; ) =		2	Jn

over ( ; ):

Computationally, there is no explicit expression for the solution so it must be found numerically. Computational methods are based on the method of quadratic programming and are not reviewed here.

Assumption 7.10.1 r( ) = 0 with rank(R) = q; where R = @@ r( )0:

The asymptotic distribution is a simple generalization of the case of a linear constraint, but the proof is more delicate.

Theorem 7.10.1 Under Assumption 1.5.1, Assumption 6.4.1, and Assumption

7.10.1, for de…ned in (7.29) ,

! N 0; V

as n

! 1

; where

V = V V R R0V R 1 R0V

The asymptotic variance matrix can be estimated by

= V

R R0V

R 1

R0V

where

b b

R =

r( )0:

Standard errors for the elements of are the square roots of the diagonal elements of n

7.11Technical Proofs*

Proof of Theorem 7.7.1, Equation (7.20). Let R? be a full rank k (k q) matrix satisfying

R0?V R = 0 and then set C = [R; R?] which is full rank and invertible. Then we can calculate that

	R0V	R R0V	R	?
C0V C =	R?0 V			?
C0V C =	R?0 V	R R?0 V	R?

=	0	0
=	0 R?0 V R?
	0 R?0 V R?

CHAPTER 7.

RESTRICTED ESTIMATION

145

and

C0V (W )C =

R0V

(W )R R0V

(W )R

R?0 V (W )R R?0 V (W )R?

Thus

0 R?0 V R? + R?0 W R (R0W R) 1 R0V R (R0W R) 1 R0W R?

= C00

(W )

(W )C

0 R?0 W R (R0W R) 1 R0V R (R0W R) 1 R0W R? 0

Since C is invertible it follows that V (W ) V 0 which is (7.20).

Proof of Theorem 7.10.1. For simplicity, we assume that the constrained estimator is consistent

! . This can be shown with more e¤ort, but requires a deeper treatment than appropriate

for this textbook.

e For each element rj( ) of the q-vector r( ); by the mean value theorem there exists a j on

and such that

the line segment joining e

( ) = r

( ) +

( )0

(7.31)

Let R

be the k

q matrix

@ j

j e

Rn =

rq( q) :

r1( 1)

r2( 2)

Since

R: Stacking the (7.31), we obtain

it follows that j ! , and by the CMT, Rn

r( ) = r( ) + R0

Since r( ) = 0 by construction and r( ) = 0 by Assumption 7.6.1, this implies

0 = R0

(7.32)

The …rst-order condition for (7.30) is

Premultiplying by R0V ; inverting,

V 1 = R :

and using (7.32), we …nd

e e

= bR

V R

1 R

= R

V R

1 R

Thus

n0 b e

b e

n0 b e

n0 b

= I V R Rn0V H 1 Rn0 :

From Theorem 6.4.2

and Theorem 6.8.2 we …nd

b e

I V R Rn0V R

Rn0 p

! I V R R0V R

R0 N (0; V )

=N 0; V :

CHAPTER 7. RESTRICTED ESTIMATION

146

Exercises

Exercise 7.1 In the model y = X1 1 + X2 2 + e; show directly from de…nition (7.3) that the CLS estimate of = ( 1; 2) subject to the constraint that 2 = 0 is the OLS regression of y on

X1:

Exercise 7.2 In the model y = X1 1 + X2 2 + e; show directly from de…nition (7.3) that the CLS estimate of = ( 1; 2); subject to the constraint that 1 = c (where c is some given vector) is the OLS regression of y X1c on X2:

Exercise 7.3 In the model y = X1 1 + X2 2 + e; with X1 and X2 each n k; …nd the CLS estimate of = ( 1; 2); subject to the constraint that 1 = 2:

Exercise 7.4 Verify that for de…ned in (7.8) that R0			= c:
Exercise 7.5 Verify (7.14).		e	e
Exercise 7.6	Verify that the minimum distance estimator with W				= Q 1	equals the CLS
estimator.			e	n	bxx
Exercise 7.7	Prove Theorem 7.6.1.
Exercise 7.8	Prove Theorem 7.6.2.

Exercise 7.9 Prove Theorem 7.6.3. (Hint: Use that CLS is a special case of Theorem 7.6.2.)

Exercise 7.10 Verify that (7.18) is V (W ) with W = V 1:

Exercise 7.11 Prove (7.19). Hint: Use (7.18).

Exercise 7.12 Verify (7.21), (7.22) and (7.23)

Exercise 7.13 Verify (7.24), (7.25), and (7.26).

Chapter 8

Testing

8.1t tests

The t-test is routinely used to test hypotheses on . A simple null and composite hypothesis takes the form

H0	:	= 0
H1	:	6= 0

where 0 is some pre-speci…ed value. A t-test rejects H0 in favor of H1 when jtn( 0)j is large. By “large”we mean that the observed value of the t-statistic would be unlikely if H0 were true.

Formally, we …rst pick an asymptotic signi…cance level . We then …nd z =2; the upper =2 quantile of the standard normal distribution which has the property that if Z N(0; 1) then

Pr jZj > z =2 = :

For example, z:025 = 1:96 and z:05 = 1:645: A test of asymptotic signi…cance rejects H0 if jtnj > z =2: Otherwise the test does not reject, or “accepts”H0:

The asymptotic signi…cance level is because Theorem 6.11.1 implies that

Pr (reject	H	0	j H	0 true) =	Pr	tn	> z =2	j	= 0
	H		j H	!	Pr	jZ j	> z =2	j	= :
				!		j j

The rejection/acceptance dichotomy is associated with the Neyman-Pearson approach to hypothesis testing.

While there is no objective scienti…c basis for choice of signi…cance level ; the common practice is to set = :05 or 5%. This implies a critical value of z:025 = 1:96 2. When jtnj > 2 it is common to say that the t-statistic is statistically signi…cant. and if jtnj < 2 it is common to say that the t-statistic is statistically insigni…cant. It is helpful to remember that this is simply a way of saying “Using a t-test, the hypothesis that = 0 can [cannot] be rejected at the asymptotic 5% level.”

A related statistic is the asymptotic p-value, which can be interpreted as a measure of the evidence against the null hypothesis. The asymptotic p-value of the statistic tn is

pn = p(tn)

where p(t) is the tail probability function

p(t) = Pr (jZj > jtj) = 2 (1 (jtj)) :

If the p-value pn is small (close to zero) then the evidence against H0 is strong.

147

CHAPTER 8. TESTING

148

An equivalent statement of a Neyman-Pearson test is to reject at the % level if and only if pn < : Signi…cance tests can be deduced directly from the p-value since for any ; pn < if and only if jtnj > z =2: The p-value is more general, however, in that the reader is allowed to pick the level of signi…cance , in contrast to Neyman-Pearson rejection/acceptance reporting where the researcher picks the signi…cance level. (However, the Neyman-Pearson approach requires the reader to select the signi…cance level before observing the p-value.)

Another helpful observation is that the p-value function is a unit-free transformation of the

t statistic. That is, under H0; pn ! U[0; 1]; so the “unusualness” of the test statistic can be compared to the easy-to-understand uniform distribution, regardless of the complication of the distribution of the original test statistic. To see this fact, note that the asymptotic distribution of jtnj is F (x) = 1 p(x): Thus

Pr (1 pn u) = Pr (1 p(tn) u)

=Pr (F (tn) u)

=Pr jtnj F 1(u)

! F F 1(u) = u;

d d

establishing that 1 pn ! U[0; 1]; from which it follows that pn ! U[0; 1]:

8.2t-ratios

Some applied papers (especially older ones) report “t-ratios”for each estimated coe¢ cient. For a coe¢ cient these are

	^
tn = tn(0) =		;
tn = tn(0) =	^	;
	s( )

the ratio of the coe¢ cient estimate to its standard error, and equal the t-statistic for the test of the hypothesis H0 : = 0: Such papers often discuss the “signi…cance” of certain variables or coe¢ cients, or describe “which regressors have a signi…cant e¤ect on y” by noting which t-ratios exceed 2 in absolute value.

This is very poor econometric practice, and should be studiously avoided. It is a receipe for banishment of your work to lower tier economics journals.

Fundamentally, the common t-ratio is a test for the hypothesis that a coe¢ cient equals zero. This should be reported and discussed when this is an interesting economic hypothesis of interest. But if this is not the case, it is distracting.

Instead, when a coe¢ cient is of interest, it is constructive to focus on the point estimate, its standard error, and its con…dence interval. The point estimate gives our “best guess” for the value. The standard error is a measure of precision. The con…dence interval gives us the range of values consistent with the data. If the standard error is large then the point estimate is not a good summary about : The endpoints of the con…dence interval describe the bounds on the likely possibilities. If the con…dence interval embraces too broad a set of values for ; then the dataset is not su¢ ciently informative to render inferences about : On the other hand if the con…dence interval is tight, then the data have produced an accurate estimate, and the focus should be on the value and interpretation of this estimate. In contrast, the widely-seen statement “the t-ratio is highly signi…cant”has little interpretive value.

The above discussion requires that the researcher knows what the coe¢ cient means (in terms of the economic problem) and can interpret values and magnitudes, not just signs. This is critical for good applied econometric practice.

CHAPTER 8. TESTING

149

8.3Wald Tests

Sometimes = h( ) is a q 1 vector, and it is desired to test the joint restrictions simultaneously. We have the null and alternative

H0	:	= 0
H1	:	6= 0:

A commonly used test of H0 against H1 is the Wald statistic (6.34) evaluated at the null hypothesis

= n 0

1 0

(8.1)

Typically, we have = h( ) with asymptotic covariance matrix estimate

= H0

where

c b c

h( ):

Then

Wn = n h( ) 0 0 H0

V H 1 h( ) 0 :

then the Wald statistic simpli…es to

When h is a linear function of ; hb( ) = R0 ;c b c

Wn = n R0 0

0 R0V

R 1 R0 0 :

As shown in Theorem 6.14.2, when = 0 then Wn ! q

; a chi-square random variable with

q degrees of freedom.

Theorem 8.3.1 Under

Assumption

1.5.1,

Assumption

6.4.1,

rank(H ) = q; and H0; then Wn ! q2.

An asymptotic Wald test rejects H0 in favor of H1 if Wn exceeds 2q( ); the upperquantile

of the 2q distribution. For example, 21(:05) = 3:84 = z:2025: The Wald test fails to reject if Wn is less than 2q( ): As with t-tests, it is conventional to describe a Wald test as “signi…cant” if Wn

exceeds the 5% critical value.

Notice that the asymptotic distribution in Theorem 8.3.1 depends solely on q –the number of restrictions being tested. It does not depend on k –the number of parameters estimated.

The asymptotic p-value for Wn is pn = p(Wn); where p(x) = Pr		q2 x	is the tail probability
2	level if and only if p				< ; and
function of the q distribution. The Wald test rejects at the %				n

pn is asymptotically U[0; 1] under H0: In applied work it is good practice to report the p-value of a Wald statistic, as it helps readers intrepret the magnitude of the statistic.

8.4Minimum Distance Tests

The Wald test (8.1) measures the distance between the unrestricted estimate and the null

hypothesis 0. A minimum distance test measures the distance between and the restricted

estimate of the previous chapter. Recall that under the restriction

h( ) = 0

CHAPTER 8.	TESTING										150
the e¢ cient minimum distance estimate solves the minimization problem
	= argmin J						( )
where	e	h( )= 0			n
where		0 V 1						:
	Jn ( ) = n	0 V 1						:
The minimum distance test statistic of H0		against		Hb1		is		b
The minimum distance test statistic of H0		b		Hb1				b
		) =		min				J		( )
or more simply	Jn = Jn(e		h( )= 0						n		:
	Jn = n 0			V 1							:
	b		e	b				b		e

An asymptotic test rejects H0 in favor of H1 if Jn exceeds 2q( ); the upperquantile of the 2q distribution. Otherwise the test does not reject H0:

When h( ) is linear it turns out that Jn = Wn; so the Wald and minimum distance tests are equal. When h( ) is non-linear then the two tests are di¤erent.

The chi-square critical value is justi…ed by the following theorm.

Theorem 8.4.1 UnderAssumption 1.5.1, Assumption 6.4.1, rank(H ) =

d 2

q; and H0; then Jn ! q.

8.5F Tests

Take the linear model

y = X1 1 + X2 2 + e

where X1 is n k1; X2 is n k2; k = k1 + k2; and the null hypothesis is

	H0 : 2 = 0:
In this case, = 2; and there are q = k2 restrictions. Also h( ) = R0 is linear with R =							0
							I
a selector matrix. We know that the Wald statistic takes the form
Wn = n 0V 1
=	n 0	R V	R	1		:
=	b2b	0b			2

Now suppose that covariance matrix is computed under the assumption of homoskedasticity, so

that V

is replaced with V

= s2

: We de…ne the “homoskedastic”Wald statistic

Wn0 = n 0 V 0

0 R

b 0

What we show in this section is that this Wald statistic		can be written very simply using the
b	b	b
formula	e~0e~ e^0e^
Wn0 = (n k)
			(8.2)
	e^0e^

CHAPTER 8. TESTING						151
where	e	e	= X10 X1	1
e~ =	e	e		1		X10 y
are from OLS of y on X1; and	y X1 1;	1				X10 y
e^ = y X ;		= X0X 1			X0y
	b	b

are from OLS of y on X = (X1; X2):

The elegant feature about (8.2) is that it is directly computable from the standard output from two simple OLS regressions, as the sum of squared errors is a typical output from statistical packages. This statistic is typically reported as an “F-statistic”which is de…ned as

Wn0

e~0e~ e^0e^ =k2

Fn =

e^0e^=(n

While it should be emphasized that equality (8.2) only holds if V 0

= s2

n 1X0X 1 ; still this

formula often …nds good use in reading applied papers. Because of this

connection we call (8.2) the

F form of the Wald statistic. (We can also call Wn0 a homoskedasticb

form of the Wald statistic.)

We now derive expression (8.2). First, note that by partitioned matrix inversion (A.4)

R0 X0X

R = R0

X10 X1 X10 X2

R = X20 M1X2

X20 X1 X20 X2

where M1

= I

X1(X0

X1) 1X0 : Thus

and

R0V 0 R

= s 2n 1 R0

X0X

1 R

= s 2n 1 X20 M1X2

Wn0 = n 20 R0V 0 R

b2 (X20

)

To simplify this expression further, note that

if we regress y on X1 alone, the residual is

e~ = M1y: Now consider the residual regression of e~ on X2 = M1X2:

By the FWL theorem,

e~ = X2 2

+ e^ and X20 e^ = 0: Thus

f b

= f2

e~0e~ =

X2 2 + e^ X

2 2 + e^

0 X

X + e^ e^

b2f20

X M X

+ e^ e^;

or alternatively,

0 X

0 M

= e~0e~

e^0e^:

Also, since

2 b

we conclude that

s2 = (n k) 1 e^0e^

e~0e~ e^0e^

Wn0 = (n k)

e^0e^

as claimed.

In many statistical packages, when an OLS regression is estimated, an “F-statistic”is reported. This is Fn when X1 is a vector of ones, so H0 is an intercept-only model. This special F statistic is

CHAPTER 8. TESTING

152

testing the hypothesis that all slope coe¢ cients (all coe¢ cients other than the intercept) are zero. This was a popular statistic in the early days of econometric reporting, when sample sizes were very small and researchers wanted to know if there was “any explanatory power” to their regression. This is rarely an issue today, as sample sizes are typically su¢ ciently large that this F statistic is nearly always highly signi…cant. While there are special cases where this F statistic is useful, these cases are atypical. As a general rule, there is no reason to report this F statistic.

8.6Normal Regression Model

Now let us partition = ( 1; 2) and consider tests of the linear restriction

H0	:	2	= 0
H1	:	2	6= 0

in the normal regression model. In parametric models, a good test statistic is the likelihood ratio, which is twice the di¤erence in the log-likelihood function evaluated under the null and alternative

hypotheses. The estimator under the alternative is the unrestricted estimator (

;

; ^2) discussed

above. The Gaussian log-likelihood at these estimates is

log L( 1; 2; ^2)

e^0e^

log

2 ^

b b

log

log (2 )

The MLE under the null hypothesis is the restricted estimates ( 1; 0; ~2) where 1 is the OLS

estimate from a regression of y

variance ~

2: The log-likelihood of this

model is

x1i only, with residual

log L( 1; 0; ~2) =

log ~2

log (2 )

The LR statistic for H0

against

;

; ^2)

log L(

; 0; ~2)

log L(2

n log

~ b

blog ^

= n log

By a …rst-order Taylor series approximation

LRn = n log 1 +

1 ' n

1 = Wn0:

the homoskedastic Wald statistic. This shows that the two statistics (LRn and Wn0) can be numerically close. It also shows that the homoskedastic Wald statistic for linear hypotheses can also be interpreted as an appropriate likelihood ratio statistic under normality.

8.7Problems with Tests of NonLinear Hypotheses

While the t and Wald tests work well when the hypothesis is a linear restriction on ; they can work quite poorly when the restrictions are nonlinear. This can be seen by a simple example introduced by Lafontaine and White (1986). Take the model

yi = + ei ei N(0; 2)

<<< < Предыдущая 4 5 6 7 8 9 10 11 12 13 14 1516 / 3016 17 18 19 20 21 22 23 24 25 26 27 28 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
21.03.20167.29 Mб58dm_lektsii.pdf
#
24.11.2018268.8 Кб7Doc12.doc
#
16.04.2015366.98 Кб5Doklad_3.docx
#
16.09.2019136.7 Кб92Doklad_dlya_fila (1).doc
#
20.09.2019182.78 Кб1Domentikristalka_19-24.doc
#
21.03.20161.77 Mб10Econometrics2011.pdf
#
18.12.20181.47 Mб3EDS_final.doc
#
25.09.201962.46 Кб1Ekonomika_s_29_po_32.doc
#
16.04.201536.86 Кб31Ekzamenatsionny_test (2).doc
#
11.09.20191.21 Mб14ekzamen_kontrolling.doc
#
17.04.20192.9 Mб4Ekzamen_po_mikre_Otvety_1 (3).doc