Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

113

The product xiei is iid (since the observations are iid) and mean zero (since E(xiei) = 0):

De…ne the k k covariance matrix

=

x x e2 :

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(6.8)

We require the elements of to be …nite, orEequivalentlyi i0 i

that

Ek

x

e

ik

2

<

1

: Using

k

x

e

ik

2 =

kxik2 ei2 and the Cauchy-Schwarz Inequality (B.20),

 

 

 

 

i

 

 

 

 

 

 

i

 

 

 

Ekxieik2 = E kxik2 ei2

Ekxik4

1=2 Eei4 1=2

 

 

 

 

 

 

 

 

 

(6.9)

which is …nite if xi and ei have …nite fourth moments. As ei

is a linear combination of yi and xi;

it is su¢ cient that the observables have …nite fourth moments (Theorem 3.16.1.6).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Assumption 6.4.1 In addition to Assumption

3.16.1,

E

y4

<

1

and

 

 

 

 

 

 

 

Ekxik4 < 1:

 

 

 

 

 

 

 

i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Under Assumption 6.4.1 the CLT (Theorem 2.8.1) can be applied.

Theorem 6.4.1 Under Assumption 1.5.1 and Assumption 6.4.1, as n ! 1

 

 

n

 

1

Xi

 

d

(6.10)

p

n

=1 xiei ! N (0; )

where = E xix0ie2i :

Putting together (6.1), (6.7), and (6.10),

p

 

 

 

 

 

d

1

 

(0; )

 

n

N

 

 

 

 

 

! Qxx

1

1

 

 

 

b

 

 

 

 

 

 

 

Qxx

 

 

 

 

 

= N 0; Qxx

 

as n ! 1; where the …nal equality follows from the property that linear combinations of normal vectors are also normal (Theorem B.9.1).

We have derived the asymptotic normal approximation to the distribution of the least-squares estimator.

Theorem 6.4.2 Asymptotic Normality of Least-Squares Estimator

Under Assumption 1.5.1 and Assumption 6.4.1, as n ! 1

 

 

 

d

 

 

pn

 

 

! N (0; V )

 

where

b

 

 

 

 

 

V = Qxx1

Qxx1;

(6.11)

Qxx = E(xixi0) ; and = E xixi0ei2 :

 

 

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

114

In the stochastic order notation, Theorem 6.4.2 implies that

 

 

 

 

 

= + O

(n 1=2)

(6.12)

and

 

b

p

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

= Op(n 1=2)

 

 

 

which is stronger than (6.6).

 

 

 

 

 

: Con-

The matrix V = avar( ) is the variance of the asymptotic distribution of p

 

n

sequently, V is often

referred to as the

asymptotic covariance matrix

of : The expression

b

 

 

 

b

V = Qxx1 Qxx1 is called a sandwich form. It might be worth noticing that there is a di¤erence

between the variance of the asymptotic distribution given in (6.11) and the …nite-sampleb

conditional

variance in the CEF model as given in (5.11):

 

 

 

 

 

 

1

 

1

 

1

1

 

1

 

V = nX0X

 

nX0DX nX0X

:

 

b

 

 

 

 

 

 

 

 

 

 

While V

and V are di¤erent, the two are close if n is large. Indeed, as n ! 1

 

b

p

V b ! V :

There is a special case where and V simplify. We say that ei is a Homoskedastic Projection Error when

cov(xixi0; ei2) = 0:

(6.13)

Condition (6.13) holds in the homoskedastic linear regression model, but is somewhat broader. Under (6.13) the asymptotic variance formulas simplify as

 

 

 

x0

 

 

 

 

 

 

=

E

x

 

E e2

= Q

 

2

(6.14)

 

i

i

 

i

xx

 

 

 

V =

Qxx1 Qxx1 = Qxx1 2

 

 

V 0

(6.15)

In (6.15) we de…ne V 0 = Qxx1 2 whether (6.13) is true or false. When (6.13) is true then V = V 0 ; otherwise V 6= V 0 : We call V 0 the homoskedastic asymptotic covariance matrix.

Theorem 6.4.2 states that the sampling distribution of the least-squares estimator, after rescaling, is approximately normal when the sample size n is su¢ ciently large. This holds true for all joint distributions of (yi; xi) which satisfy the conditions of Assumption 6.4.1, and is therefore broadly

applicable. Consequently, asymptotic normality is routinely used to approximate the …nite sample p

b

distribution of n :

b

A di¢ culty is that for any …xed n the sampling distribution of can be arbitrarily far from the normal distribution. In Figure 6.1 we have already seen a simple example where the least-squares estimate is quite asymmetric and non-normal even for reasonably large sample sizes. The normal approximation improves as n increases, but how large should n be in order for the approximation to be useful? Unfortunately, there is no simple answer to this reasonable question. The trouble is that no matter how large is the sample size, the normal approximation is arbitrarily poor for some data distribution satisfying the assumptions. We illustrate this problem using a simulation.

Let yi = 1xi + 2 + ei where xi is N (0; 1) ; and ei is independent of xi with the Double Pareto density f(e) = 2 jej 1 ; jej 1: If > 2 the error ei has zero mean and variance =( 2):

As approaches 2, however, its variance diverges to in…nity. In this context the normalized least-

n

 

 

2

^

 

 

 

 

for any > 2.

 

 

 

 

 

 

squares slope estimator q

 

 

 

1

 

1 has the N(0; 1) asymptotic distibution

 

 

 

 

 

 

 

 

 

 

 

2

 

^1 1 ;

In Figure 6.3 we display the …nite sample densities of the normalized estimator n

 

 

 

 

 

 

 

 

 

 

close to the N(0; 1)

setting n = 100 and varying the parameter . For = 3:0 the density is very q

 

 

 

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

115

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.3: Density of Normalized OLS estimator with Double Pareto Error

density. As diminishes the density changes signi…cantly, concentrating most of the probability mass around zero.

Another example is shown in Figure 6.4. Here the model is yi = + ei where

 

 

uk

k

 

 

 

 

 

ei =

E

 

i

E uik

 

 

 

 

(6.16)

2k

 

2

1=2

 

ui

E ui

 

 

 

 

 

 

p

 

 

 

b

 

and ui N(0; 1): We show the sampling distribution of

n

 

setting n = 100; for k = 1; 4,

 

 

6 and 8. As k increases, the sampling distribution becomes highly skewed and non-normal. The lesson from Figures 6.3 and 6.4 is that the N(0; 1) asymptotic approximation is never guaranteed to be accurate.

6.5Joint Distribution

Theorem 6.4.2 gives the joint asymptotic distribution of the coe¢ cient estimates. We can use the result to study the covariance between the coe¢ cient estimates. For example, suppose k = 2

^

^

2): For simplicity suppose that the regressors are mean zero. Then

and write the estimates as (

1;

we can write

 

Qxx =

12

1 2

 

 

 

 

 

1 2

22

where 21 and 22 are the variances of x1i and x2i; and is their correlation. If the error is ho-

^ ^ 0 1 2

moskedastic, then the asymptotic variance matrix for ( 1; 2) is V = Qxx : By the formula for inversion of a 2 2 matrix,

Q 1

=

1

 

 

22

1 2

:

12 22 (1 2)

xx

 

1 2

12

 

 

 

 

 

 

^

^

are negatively correlated (and

Thus if x1i and x2i are positively correlated ( > 0) then 1

and 2

vice-versa).

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

116

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.4: Density of Normalized OLS estimator with error process (6.16)

For illustration, Figure 6.5 displays the probability contours of the joint asymptotic distribution

 

^

^

2

2

2

= 1 and = 0:5: The coe¢ cient estimates are negatively

of 1 1

and 2 2 when 1

= 2 =

 

 

 

 

 

 

 

^

is unusually negative,

correlated since the regressors are positively correlated. This means that if 1

 

 

^

is unusually positive, or conversely. It is also unlikely that we will observe both

it is likely that 2

^

^

unusually large and of the same sign.

 

1

and 2

 

This …nding that the correlation of the regressors is of opposite sign of the correlation of the coef- …cient estimates is sensitive to the assumption of homoskedasticity. If the errors are heteroskedastic then this relationship is not guaranteed.

This can be seen through a simple constructed example. Suppose that x1i and x2i only take the values f1; +1g; symmetrically, with Pr (x1i = x2i = 1) = Pr (x1i = x2i = 1) = 3=8; and Pr (x1i = 1; x2i = 1) = Pr (x1i = 1; x2i = 1) = 1=8: You can check that the regressors are mean zero, unit variance and correlation 0.5, which is identical with the setting displayed in Figure 6.5 when the error is homoskedastic.

 

 

 

Now suppose that the error is heteroskedastic. Speci…cally, suppose that E ei2 j x1i = x2i

=

5

 

and

 

 

2

x1i = x2i

=

1

: You can check that

 

 

 

e2

= 1;

 

 

 

x2 e2

=

 

x2 e2

= 1

and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

i

 

 

 

 

 

1i i

 

2i i

 

4

 

 

E

2

i j

7

6

 

4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E

 

E

 

E

 

 

E

x1ix2iei

=

 

 

: Therefore

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V = Qxx1

Qxx1

 

32 7

8

 

32

 

1

 

2 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

9

2

 

1

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

1

 

1

7

 

 

1

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

 

 

 

54

 

 

54

 

 

5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

16 6

 

 

 

1

76

 

1

 

76

 

 

 

1

7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

8

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

 

2

1

 

1

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

4

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6

4

1

7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^

 

 

4

 

 

^

 

 

5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

are positively correlated (their correlation is 1=4:) The

Thus the coe¢ cient estimates 1

and 2

joint probability contours of their asymptotic distribution is displayed in Figure 6.6. We can see how the two estimates are positively associated.

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

117

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^ ^

Figure 6.5: Contours of Joint Distribution of ( 1; 2); homoskedastic case

What we found through this example is that in the presence of heteroskedasticity there is no simple relationship between the correlation of the regressors and the correlation of the parameter estimates.

We can extend the above analysis to study the covariance between coe¢ cient sub-vectors. For example, partitioning x0i = (x01i; x02i) and 0 = 01; 02 ; we can write the general model as

 

b

yi = x10 i 1 + x20 i 2 + ei

 

 

 

 

 

11 b

b

: Make the partitions

 

 

and the coe¢ cient estimates as

0

= 10 ; 20

 

 

Qxx =

Q

Q12

;

11

12

:

 

Q21

Q22

= 21

22

(6.17)

From (3.37)

 

 

Q 1

Q 1

Q Q 1

 

 

1

 

 

 

 

 

11 2

2

12

22

 

Qxx

=

Q2211Q21Q111

11Q

2211

 

 

where Q11 2 = Q11 Q12Q221Q21 and Q22 1 = Q22 Q21Q111Q12. Thus when the error is homoskedastic,

b

b

= 2Q1112Q12Q221

cov 1

; 2

which is a matrix generalization of the two-regressor case. In the general case, you can show that (Exercise 6.5)

 

 

 

V =

V 11

V 12

 

 

where

 

 

V 21

V 22

 

 

 

 

 

 

 

 

 

 

V 11

= Q1112

11

Q12Q221 21

 

12Q221Q21 + Q12Q221 22Q221Q21

 

Q1112

V 21

= Q2211

21

Q21Q111 11

 

22Q221Q21 + Q21Q111 12Q221Q21

Q1112

V 22

= Q2211

22

Q21Q111 12

 

21Q111Q12 + Q21Q111 11Q111Q12

Q2211

 

 

 

 

 

 

 

 

 

(6.18)

(6.19)

(6.20)

(6.21)

Unfortunately, these expressions are not easily interpretable.

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

118

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^ ^

Figure 6.6: Contours of Joint Distribution of 1 and 2; heteroskedastic case

6.6Uniformly Consistent Residuals*

We have described the least-squares residuals e^i as estimates of the errors ei: Are e^i consistent for ei? Notice that we can write the residual as

e^i = yi xi0

 

 

i0

 

 

=

ei

+ xi0

b

 

 

 

p

 

i

 

 

i0

b

 

 

 

 

= e

 

x

 

 

 

 

 

b

 

 

 

 

 

 

 

 

:

Since ! 0 it seems reasonable to guess that e^i will be close to ei if n is large.

Webcan bound the di¤erence in (6.22) using the Schwarz inequality (A.7) to …nd

je^i eij =

xi0

 

kxik

 

:

 

 

b

 

 

 

 

 

 

 

 

 

b

 

 

(6.22)

(6.23)

To bound (6.23) we can use

 

= Op(n 1=2) from Theorem 6.4.2, but we also need to bound

 

 

 

 

 

b

 

 

the random variable kxik.

The key is Theorem 2.12.1 which i; or

Applied to (6.23) we obtain

max je^i

1 i n

We have shown the following.

shows that Ekxik4

< 1 implies xi = op n1=4 uniformly in

n 1=4 max

 

x

 

 

p

 

0:

 

1

i

 

n k

 

 

ik !

 

 

 

 

 

 

 

 

 

 

 

 

 

 

eij

 

p

 

 

 

 

 

p

 

 

max

 

 

 

 

 

 

1 i n kxik

 

 

 

 

 

 

 

 

1=4

 

1=2

 

=

 

o

 

 

n

1=4

 

 

 

 

 

 

 

 

O b(n

)

 

=

 

op(n

 

):

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

119

 

 

 

 

Theorem 6.6.1 Under Assumptions 1.5.1 and 6.4.1, uniformly in 1 i n

 

 

 

e^i = ei + op(n 1=4):

 

(6.24)

 

 

 

 

What about the squared residuals e^2? Squaring the two sides of (6.24) we obtain

 

 

 

i

 

 

 

 

 

 

2 =

2ei + op(n

1=4)

2

 

 

 

 

 

 

 

 

e^i

 

1=4

 

1=2

)

 

 

= ei + 2eiop(n

) + op(n

 

 

=

ei2 + op(1)

 

 

 

 

 

(6.25)

uniformly in 1 i n; since ei = op n1=4 when Ejeij4 < 1 by Theorem 2.12.1.

 

 

Theorem 6.6.2 Under Assumptions 1.5.1 and 6.4.1, uniformly in 1 i n

 

 

 

e^i2 = ei2 + op(1)

 

 

 

 

 

 

 

 

 

 

 

 

6.7Asymptotic Leverage*

Recall the de…nition of leverage from (4.21)

hii = x0i X0X 1 xi:

These are the diagonal elements of the projection matrix P and appear in the formula for leave- one-out prediction errors and several covariance matrix estimators. We can show that under iid sampling the leverage values are uniformly asymptotically small.

Let min(A) and max(A) denote the smallest and largest eigenvalues of a symmetric square

matrix A; and note that max(A 1) = ( min(A)) 1 :

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Since 1 X0X

p

Q > 0 then by the CMT,

min

1 X0X

p

 

 

min

(Q ) > 0: (The latter is

n

!

xx

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

!

 

xx

positive since Qxx is positive de…nite and thus all its

 

eigenvalues are positive.) Then by the Trace

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Inequality (A.10)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

hii = xi0 X01X 1 xi 1

1

xixi0!

 

 

 

 

 

 

 

= tr

 

 

X0X

 

 

 

 

 

 

 

 

 

 

n

 

 

n

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

1

 

 

 

 

1

 

 

 

 

 

 

 

 

max

 

 

 

X0X

 

 

!tr

 

xixi0

 

 

 

 

 

n

 

 

n

 

 

 

 

=

min

nX0X

1

 

n kxik2

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

( min (Qxx) + op(1)) 1

1

1maxi n kxik2 :

(6.26)

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Theorem 2.12.1 shows that Ekxik2 < 1 implies

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n 1=2 max

x

 

 

 

p

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

1

i

 

n k

 

 

ik !

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

120

and thus

 

 

 

 

 

2 p

 

 

n 1 max

x

ik

0:

 

1

i

 

n k

 

!

 

 

 

 

 

 

 

 

 

 

It follows that (6.26) is op(1); uniformly in i:

Theorem 6.7.1 Under Assumption 1.5.1 and Ekxik2 < 1, uniformly in

1 i n, hii = op(1):

Theorem (6.7.1) implies that under random sampling with …nite variances and large samples, no individual observation should have a large leverage value. Consequently individual observations should not be in‡uential, unless one of these conditions is violated.

6.8Consistent Covariance Matrix Estimation

In Sections 5.7 and 5.8 we introduced estimators of the …nite-sample covariance matrix of the least-squares estimator in the regression model. In this section we show that these estimators are consistent for the asymptotic covariance matrix.

First, consider the covariance matrix estimate constructed under the assumption of homoskedasticity:

 

 

 

 

 

 

 

0

 

 

 

 

 

1

X0X

1

 

2

 

 

 

 

1 2

 

 

 

 

 

 

 

 

 

 

 

 

V

 

=

 

 

s

 

 

= Qxx s

:

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

p

 

 

b b

 

2

 

p

 

 

2

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

Since Qxx ! Qxx

(Theorem 6.2.1), s

 

 

!

 

(Theorem 6.3.1), and Qxx is invertible (Assumption

3.16.1), it follows that

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

V

0

 

 

 

 

 

 

 

1

 

2

 

p

 

 

 

 

 

1

 

2

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

= Qxx s ! Qxx

 

= V

 

 

 

 

 

 

 

0

 

 

0

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b b

 

 

the homoskedastic covariance matrix.

 

 

 

 

 

 

so that V is consistent for V ;

b

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Theorem 6.8.1 Under Assumption 1.5.1 and Assumption 3.16.1,

 

 

 

 

 

 

 

V 0

p

V 0 as n

! 1

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b b

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now consider the heteroskedasticity-robust covariance matrix estimators V

 

; V

; and

 

.

 

V

Writing

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b b

 

e b

 

 

b

 

 

 

 

 

=

1

 

xix0e^2;

 

 

 

 

 

 

 

 

 

 

 

(6.27)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

=1

 

 

 

i

i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

n

 

 

 

 

 

 

) 2 x

x0e^2

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

 

 

 

 

(1

h

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

=1

 

 

 

ii

 

 

 

 

 

i

i i

 

 

 

 

 

 

 

and

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

) 1 x

x0e^2

 

 

 

 

 

 

 

 

 

 

 

 

=

 

(1

h

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=1

 

 

 

ii

 

 

 

 

 

i

 

 

i i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

as moment estimators for = E xix0ie2i ; then the covariance matrix estimators are

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

 

121

 

 

 

 

 

V

 

 

= Q 1 Q 1;

 

 

 

 

 

 

 

 

 

 

 

b b

 

bxx b bxx

 

 

 

 

 

 

 

 

 

 

 

 

 

 

xx

 

 

 

xx

 

 

 

 

 

 

 

 

 

 

 

V

 

 

= Q 1 Q 1;

 

 

 

 

 

 

and

 

 

 

 

e b

 

b

e b

 

 

 

 

 

 

 

 

 

 

 

 

 

V

 

= Q 1

 

Q 1

:

 

 

 

 

 

 

 

 

 

 

 

 

 

xx

 

 

xx

 

 

 

 

 

 

 

 

 

 

 

 

consistent for : Combined with the consistency of Q

 

We can show that , , and are

xx

 

 

b

b

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b b

 

e b

 

 

b

 

 

for Qxx and the

invertibility of Q

xx

we …nd that

V

 

,

V

 

; and V

 

converge in probability to

b e

 

 

 

 

 

 

 

 

 

 

 

b

 

Qxx1 Qxx1 = V : The complete proof is given in Section 6.18.

Theorem 6.8.2 Under Assumption 1.5.1 and Assumption 6.4.1, as n ! 1;

 

p

 

p ;

 

p

 

 

p

 

 

p

V ; and

 

p

 

 

;

 

; V

 

V ; V

 

V

V :

b

!

e

! !

b b

!

e b

!

 

 

b !

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6.9Functions of Parameters

Sometimes we are interested in a lower-dimensional function of the parameter vector = ( 1; :::; k): For example, we may be interested in a single coe¢ cient j or a ratio j= l: In these cases we can write the parameter of interest as a function of : Let h : Rk ! Rq denote this function and let

= h( )

denote the parameter of interest. The estimate of is

bb

= h( ):

By the continuous mapping theorem (Theorem 2.9.1) and the fact p we can deduce that

b

 

 

 

b !

is consistent for .

 

 

 

 

Theorem 6.9.1 Under Assumption 1.5.1 and Assumption 3.16.1, if h( ) is con-

 

 

tinuous at the true value of ; then as

 

p

:

 

 

 

 

n ! 1; b !

 

 

Furthermore, by the Delta Method

 

 

 

is asymptotically normal.

 

 

(Theorem 2.10.3) we know that b

Theorem 6.9.2 Asymptotic Distribution of Functions of Parameters

Under Assumption 1.5.1 and Assumption 6.4.1, if h( ) is continuously di¤erentiable at the true value of ; then as n ! 1;

 

 

 

d

 

 

pn

 

(6.28)

! N (0; V )

where

b

 

 

 

 

 

V = H0

V H

(6.29)

and

 

@

 

 

 

 

H =

h( )0:

 

 

 

@

 

 

 

 

 

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

122

In many cases, the function h( ) is linear:

h( ) = R0

for some k q matrix R: In this case, H = R. In particular, if R is a “selector matrix”

R =

I

(6.30)

0

 

 

so that = R0 = 1 for = ( 10 ; 20 )0; then

 

 

V = I 0 V

I

= V 11;

0

where V 11 is given in (6.19). Under homoskedasticity the covariance matrix (6.19) simpli…es to

V 011 = Q1112 2:

We have shown that for the case (6.30) of a subset of coe¢ cients, (6.28) is

 

 

b

d

 

 

 

 

1 ! N (0; V 11)

pn 1

with V 11 given in (6.19).

6.10Asymptotic Standard Errors

How do we estimate the covariance matrix V for ? From (6.29) we see we need estimates of

H and V . We already have an estimate of the

latter, V

 

(or V

 

or V

 

). To estimate H

 

we

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

use

 

H

=

@

 

h( ): b b

e b

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Putting the parts together we obtain

 

c

 

 

@

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V = H V

 

H

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b: As

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

as the covariance matrix estimator for

the primaryb

justi…cation for V

 

is the asymptotic

c b c

 

 

 

 

 

 

 

 

 

 

 

 

 

approximation (6.28), V is often calledban asymptotic covariance matrix

estimator.

 

 

 

 

 

b

 

 

 

 

 

 

In particular, whenbh( ) is linear h( ) = R0 then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V = R0V R:

 

 

 

 

 

 

 

 

 

 

 

 

 

When R takes the form of a selector

matrix as in (6.30) then

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

b b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V = V 11 = hV i11 ;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

the upper-left block of the covariance

matrix estimate V :

 

 

 

 

 

 

 

 

 

 

 

 

b

 

b

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^

 

 

 

 

 

b

 

; that is,

When q = 1 (so h( ) is real-valued), the standard

 

error for is the square root of V

 

 

b b

 

 

 

 

 

 

 

 

 

 

 

qr

s(^) = n

1=2

b

1=2

c

0

b b c

 

 

 

 

V = n

 

H V

 

H :

 

 

 

 

 

 

 

 

^

This is known as an asymptotic standard error for s( ).

The estimator V is consistent for V under the conditions of Theorem 6.9.2 since V

 

 

 

and

 

 

 

 

 

 

 

 

 

b b

by Theorem 6.8.2, b

 

 

 

@

h( )0

p

@

h( )0 = H

 

 

 

 

H =

 

!

 

 

 

 

 

 

@

@

 

since

p

 

 

c

h( )0 is

b

 

 

 

 

 

 

 

 

 

 

b

!

 

and the function

@

 

 

 

continuous.

 

 

 

 

 

 

 

 

 

@

 

 

 

 

 

 

 

 

p

! V

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]