Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 5. LEAST SQUARES REGRESSION

93

b

Equation (5.8) says that the estimator is unbiased, meaning that the distribution of is centered at . Equation (5.7) says that the estimator is conditionally unbiased, which is a stronger result.

b

It says that is unbiased for any realization of the regressor matrix X.

5.3Variance of Least Squares Estimator

In this section we calculate the conditional variance of the OLS estimator. For any r 1 random vector Z de…ne the r r covariance matrix

var(Z) = E(Z EZ) (Z EZ)0 = EZZ0 (EZ) (EZ)0

and for any pair (Z; X) de…ne the conditional covariance matrix

var(Z j X) = E (Z E(Z j X)) (Z E(Z j X))0 j X :

The conditional covariance matrix of the n 1 regression error e is the n n matrix

D = E ee0 j X :

The i’th diagonal element of D is

E e2i j X = E e2i j xi = 2i

while the ij0th o¤-diagonal element of D is

E(eiej j X) = E(ei j xi) E(ej j xj) = 0:

where the …rst equality uses independence of the observations (Assumption 1.5.1) and the second is (5.2). Thus D is a diagonal matrix with i’th diagonal element 2i :

 

 

 

 

0

12

0

0

1

 

 

2

2

 

0

22

 

0

 

 

 

B

0

0

 

2

C

 

D = diag 1

; :::; n

 

B

 

 

 

n

C

:

 

= B ... ...

... ...

C

 

 

 

 

@

 

 

 

 

A

 

In the special case of the linear homoskedastic regression model (5.3), then

E e2i j xi = 2i = 2

and we have the simpli…cation

D = In 2:

In general, however, D need not necessarily take this simpli…ed form. For any matrix n r matrix A = A(X),

 

var(A0y j X) = var(A0e j X) = A0DA:

 

 

 

In particular, we can write = A0y where A = X (X0X) 1 and thus

 

 

 

var( b

X) = A0DA =

X0X

1 X0DX X0X

 

1

:

It is useful to note that

b j

 

 

 

 

 

Xn

X0DX = xix0i 2i ;

i=1

(5.9)

(5.10)

CHAPTER 5. LEAST SQUARES REGRESSION

94

a weighted version of X0X.

Rather than working with the variance of the unscaled estimator ; it will be useful to work

with the conditional variance of the scaled estimator p

 

 

 

:

b

n

 

 

 

 

def

 

 

 

 

 

 

 

 

 

 

b

 

 

 

V

 

 

 

 

 

 

 

 

 

 

 

 

 

= var pn j X

 

 

 

var(

j

X)

 

 

 

 

 

b

 

 

 

 

 

 

=

n b

 

 

 

X

 

 

X 1

 

 

 

n X X 1

DX X

 

 

=

1

0 b

 

 

1

0

 

 

0

1

 

1

 

 

 

1

 

 

=

 

 

X0X

 

 

 

X0DX

 

X0X

:

n

 

n

n

This rescaling might seem rather odd, but it will help provide continuity between the …nite-sample treatment of this chapter and the asymptotic treatment of later chapters. As we will see in the

next chapter, var( X) vanishes as n tends to in…nity, yet V converges to a constant matrix.

In the special bcasej

of the linear homoskedastic regressionbmodel, D = In 2, so X0DX =

X0X 2; and the variance matrix simpli…es to

 

 

1

 

1

 

V = nX0X

2:

 

b

 

 

 

Theorem 5.3.1 Variance of Least-Squares Estimator

 

In the linear regression model (Assumption 5.1.1)

 

 

 

 

V =

var p

 

 

1 j X

 

 

 

 

n

1

 

 

1

b

1

 

 

b

1

 

 

 

=

 

 

X0X

 

 

X0DX

 

X0X

 

(5.11)

n

n

n

 

where D is de…ned in (5.9).

In the homoskedastic linear regression model (Assumption 5.1.2)

1

 

1

V = nX0X

2:

b

 

 

 

5.4 Gauss-Markov Theorem

Now consider the class of estimators of which are linear functions of the vector y; and thus can be written as

 

 

 

 

 

= A0y

where A is an

n

 

k

function of X. The

least-squares estimator is the special case obtained by

 

 

e

setting A = X(X0X) 1: What is the best choice of A? The Gauss-Markov theorem, which we now present, says that the least-squares estimator is the best choice among linear unbiased estimators when the errors are homoskedastic, in the sense that the least-squares estimator has the smallest variance among all unbiased linear estimators.

To see this, since

E

(y

j

e

 

 

X) = X ; then for any linear estimator = A0y we have

 

X = A0

E

(y

j

X) = A0X ;

E e j

 

 

 

A0A X0X 1

CHAPTER 5. LEAST SQUARES REGRESSION

 

 

95

e

k

 

 

2

 

so is unbiased if (and only if) A0X = I

: Furthermore, we saw in (5.10) that

 

e

 

 

 

 

 

var j X = var

A0y j X = A0DA = A0A :

the last equality using the homoskedasticity assumption D = In 2 . The “best” unbiased linear estimator is obtained by …nding the matrix A such that A0A is minimized in the positive de…nite sense.

Theorem 5.4.1 Gauss-Markov

1. In the homoskedastic linear regression model (Assumption 5.1.2), the best (minimum-variance) unbiased linear estimator is the leastsquares estimator b 0 1 0

= X X X y

2.In the linear regression model (Assumption 5.1.1), the best unbiased linear estimator is

= X0D 1X 1

X0D 1y

(5.12)

e

 

 

The …rst part of the Gauss-Markov theorem is a limited e¢ ciency justi…cation for the leastsquares estimator. The justi…cation is limited because the class of models is restricted to homoskedastic linear regression and the class of potential estimators is restricted to linear unbiased estimators. This latter restriction is particularly unsatisfactory as the theorem leaves open the possibility that a non-linear or biased estimator could have lower mean squared error than the least-squares estimator.

The second part of the theorem shows that in the (heteroskedastic) linear regression model, the least-squares estimator is ine¢ cient. Within the class of linear unbiased estimators the best estimator is (5.12) and is called the Generalized Least Squares (GLS) estimator. This estimator is infeasible as the matrix D is unknown. This result does not suggest a practical alternative to least-squares. We return to the issue of feasible implementation of GLS in Section 9.1.

We give a proof of the …rst part of the theorem below, and leave the proof of the second part for Exercise 5.3.

Proof of Theorem 5.4.1.1. Let A be any n k function of X such that A0X = Ik: The variance of the least-squares estimator is (X0X) 1 2 and that of A0y is A0A 2: It is su¢ cient to show that the di¤erence A0A (X0X) 1 is positive semi-de…nite. Set C = A X (X0X) 1 : Note that X0C = 0: Then we calculate that

= C + X X0X 1 0 C + X X0X 1 X0X 1

=C0C + C0X X0X 1 + X0X 1 X0C + X0X 1 X0X X0X 1 X0X 1

=C0C

The matrix C0C is positive semi-de…nite (see Appendix A.7) as required.

CHAPTER 5. LEAST SQUARES REGRESSION

96

5.5Residuals

0 b

What are some properties of the residuals e^i = yi xi and prediction errors e~i = yi at least in the context of the linear regression model?

Recall from (4.25) that we can write the residuals in vector notation as

0b

xi ( i),

e^ = Me

where M = In X (X0X) 1 X0 is the orthogonal projection matrix. Using the properties of conditional expectation

E(e^ j X) = E(Me j X) = ME(e j X) = 0

and

var (e^ j X) = var (Me j X) = M var (e j X) M = MDM

(5.13)

where D is de…ned in (5.9).

We can simplify this expression under the assumption of conditional homoskedasticity

E ei2 j xi = 2:

 

In this case (5.13) simplies to

 

var (e^ j X) = M 2:

 

In particular, for a single observation i; we obtain

 

var (^ei j X) = E e^i2 j X = (1 hii) 2

(5.14)

since the diagonal elements of M are 1 hii as de…ned in (4.21).

Thus the residuals e^i are

heteroskedastic even if the errors ei are homoskedastic.

 

Similarly, we can write the prediction errors e~i = (1 hii) 1 e^i in vector notation. Set

M = diagf(1 h11) 1 ; ::; (1 hnn) 1g:

Then we can write the prediction errors as

e~ = M My

= M Me:

We can calculate that

E(e~ j X) = M ME(e j X) = 0

and

var (e~ j X) = M M var (e j X) MM = M MDMM

which simpli…es under homoskedasticity to

var (e~

X) = M MMM 2

j

= M MM 2:

 

The variance of the i’th prediction error is then

var (~ei j X) = E e~2i j X

=(1 hii) 1 (1 hii) (1 hii) 1 2

=(1 hii) 1 2:

CHAPTER 5. LEAST SQUARES REGRESSION

97

A residual with constant conditional variance can be obtained by rescaling. The standardized residuals are

ei = (1 hi) 1=2 e^i;

(5.15)

and in vector notation

e = (e1; :::; en)0 = M 1=2Me:

From our above calculations, under homoskedasticity,

var (e

j

X) = M 1=2MM 1=2

2

 

 

 

 

 

and

 

 

 

 

var (ei j X) = E ei2 j X = 2

(5.16)

and thus these standardized residuals have the same bias and variance as the original errors when the latter are homoskedastic.

5.6Estimation of Error Variance

The error variance 2 = Ee2i can be a parameter of interest, even in a heteroskedastic regression or a projection model. 2 measures the variation in the “unexplained” part of the regression. Its method of moments estimator (MME) is the sample average of the squared residuals:

^2 = n1 Xn e^2i

i=1

and equals the MLE in the normal regression model (4.14).

In the linear regression model we can calculate the mean of ^2: From (4.25), the properties of projection matrices and the trace operator, observe that

1

1

 

 

 

1

 

 

 

 

1

 

tr e0Me

1

 

Mee0 :

 

^2 =

 

e^0e^ =

 

e0MMe =

 

e0Me =

 

 

=

 

tr

 

n

n

n

n

n

 

Then

 

E ^2 j X

 

 

 

 

n tr E Mee0

 

j X

 

 

 

 

 

=

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

1

tr

 

ME

ee0 j X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

=

 

1

tr (MD) :

 

 

 

 

 

 

(5.17)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Adding the assumption of conditional homoskedasticity

2

 

2

2

 

E ei

j xi = ; so that D = In ; then

(5.17) simpli…es to

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E

^2

j X

=

1

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n tr nM k

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

2

 

 

 

 

;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

the …nal equality by (4.23). This calculation shows that ^2 is biased towards zero. The order of the bias depends on k=n, the ratio of the number of estimated coe¢ cients to the sample size.

CHAPTER 5. LEAST SQUARES REGRESSION

 

98

Another way to see this is to use (5.14). Note that

 

 

 

 

1

n

 

 

 

 

 

Xi

 

 

 

E ^2

j X = n

E

e^i2

j X

 

 

 

=1

 

 

=n1 Xn (1 hii) 2

i=1

=

n k

2

n

 

 

using (4.22).

Since the bias takes a scale form, a classic method to obtain an unbiased estimator is by rescaling

the estimator. De…ne

 

 

 

 

 

n

 

 

 

 

 

 

 

 

Xi

 

 

 

s2

=

 

1

 

e^2

:

(5.18)

 

 

 

n

 

k

i

 

 

 

 

 

 

=1

 

 

By the above calculation,

 

s2 j X = 2

 

 

so

E

 

(5.19)

 

 

E s2 = 2

 

 

and the estimator s2 is unbiased for 2: Consequently, s2 is known as the “bias-corrected estimator” for 2 and in empirical practice s2 is the most widely used estimator for 2:

Interestingly, this is not the only method to construct an unbiased estimator for 2. An estimator constructed with the standardized residuals ei from (5.15) is

 

 

 

 

1

n

1

n

 

 

 

 

 

 

 

Xi

 

 

X

 

 

 

 

 

2 = n

=1

ei2 = n

(1 hii) 1 e^i2:

 

 

 

 

 

 

 

 

 

i=1

 

You can show (see Exercise 5.6) that

 

 

 

 

 

and thus

2

is unbiased for

2

 

 

E 2 j X = 2

(5.20)

 

 

(in the homoskedastic linear regression model).

 

When the sample sizes are large and the number of regressors small, the estimators ^2; s2 and2 are likely to be close.

5.7Covariance Matrix Estimation Under Homoskedasticity

For inference, we need an estimate of the covariance matrix V b of the least-squares estimator. In this section we consider the homoskedastic regression model (Assumption 5.1.2).

Under homoskedasticity, the covariance matrix takes the relatively simple form

V

1

 

1

= nX0X

2

 

b

 

 

 

which is known up to the unknown scale 2. In the previous section we discussed three estimators of 2: The most commonly used choice is s2; leading to the classic covariance matrix estimator

 

0

1

 

1

 

b b

 

 

X0X

s2:

(5.21)

V

=

n

CHAPTER 5. LEAST SQUARES REGRESSION

 

 

 

 

 

99

Since s2 is conditionally unbiased for 2, it is simple to calculate that V

0

is conditionally

unbiased for V under the assumption of homoskedasticity:

 

 

 

b b

 

b

 

1

 

 

1

 

 

 

 

b b

 

1

 

 

0

 

1

 

 

 

 

 

 

 

E V j X

=

 

n

X0X

 

 

E s2 j X

 

 

 

 

=

 

 

X0X

 

 

2

 

 

 

 

n

 

 

 

 

 

 

=

V :

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

This estimator was the dominant covariance matrix estimator in applied econometrics in previous generations, and is still the default in most regression packages.

If the estimator (5.21) is used, but the regression error is heteroskedastic, it is possible for V 0

 

n

 

1

n

n

 

1

to be quite biased for the correct covariance matrix V =

 

1

X0X

 

 

1

X0DX

 

1

X0X

 

b b:

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

For example, suppose k = 1 and 2i = x2i . The ratio of the true variance of the least-squares estimator to the expectation of the variance estimator is

 

 

 

 

 

1

 

 

n

V

 

X

=

 

 

 

 

 

i=1 xi4

 

 

 

 

 

 

V

0 b

2 1P n 2

 

 

 

 

 

n

 

b b

 

 

 

 

 

 

 

 

P

E

j

 

n

i=1 xi

'

 

Exi4

 

Exi4

 

 

=

 

:

2Exi2

Exi2 2

(Notice that we use the fact that 2i = x2i implies 2 = E 2i = Ex2i :) This is the standardized forth moment (or kurtosis) of the regressor xi: The ratio can be any number greater than one,

for example it is 3 if xi N 0; 2

 

: We conclude that the bias of V 0 can be arbitrarily large.

constructed example, the point is that the classic covariance matrix

While this is an extreme and

 

b

 

estimator (5.21) may be quite biased when the homoskedasticity assumptionb

fails.

5.8Covariance Matrix Estimation Under Heteroskedasticity

In the previous section we showed that that the classic covariance matrix estimator can be highly biased if homoskedasticity fails. In this section we show how to contruct covariance matrix estimators which do not require homoskedasticity.

Recall that the general form for the covariance matrix is

 

V

1

 

 

 

 

1

1

 

 

 

 

1

 

 

 

1

 

 

= nX0X

 

 

 

 

nX0DX nX0X

:

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This depends on the unknown matrix D which we can write as

 

 

 

 

 

 

 

 

 

 

 

 

D =

 

diag

2

:::; 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

= E ee0 1X;

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

j

 

 

j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

2

2

 

 

 

 

 

 

 

 

 

 

=

 

E diag e1; :::; en

 

 

X

:

 

 

 

 

 

Thus D is the conditional mean of diag

e1; :::; en

;

so the latter is an unbiased estimator for D:

 

 

2

 

 

 

observable, we could construct the unbiased estimator

Therefore, if the squared errors ei were

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ideal

 

1

 

 

 

 

1

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

1

V

=

 

 

 

X0X

 

 

 

 

 

 

X0 diag

e12; :::; en2

X

 

X0X

 

n

 

 

n

n

 

b b

 

1

 

 

 

 

1

 

1

n

 

 

 

1

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

n

X0X

 

 

 

 

 

n

i=1 xixi0ei2!

n

X0X

 

:

 

CHAPTER 5. LEAST SQUARES REGRESSION

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

100

Indeed,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ideal

1

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E

V

j X

=

 

 

 

 

 

X0X

 

 

 

 

 

 

 

 

 

i=1 xixi0E ei2 j X

 

 

!

 

X0X

 

 

n

1

 

 

 

 

n

 

n

 

 

 

b b

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 n

 

 

 

 

 

 

1

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

n

X0X

 

 

 

 

 

 

 

 

n

i=1 xixi0 i2!

n

X0X

 

 

 

 

 

 

 

 

 

 

=

 

 

nX0X

1

nX0DX nX0X

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

V

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ideal

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

verifying that V

 

is unbiased for V

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

errors e2 are unobserved, V ideal is not a feasible estimator. To construct a feasible

Since the

b b

i

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

estimator we can replace the errors with

 

the least-squares residuals e^ ; the prediction errors e~ or

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i

 

 

 

 

 

 

i

the standardized residuals ei; e.g.

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

diag

 

 

e^12; :::; e^n2 ;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

D

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

D

 

=

 

 

 

diag

 

e~12; :::; e~n2

 

;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e12; :::; en2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

D =

 

 

diag

 

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Substituting these matrices into the formula for V we obtain the estimators

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

1

 

 

1

 

 

 

 

b

 

 

1

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

V =

 

 

X0X

 

 

 

 

 

 

 

 

X0DX

 

X0X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

1

n

n

 

 

 

 

1

 

 

 

 

 

 

 

 

 

b b

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

1

 

 

n b

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

n

X0X

 

 

 

 

 

 

 

n

i=1 xixi0e^i2!

n

X0X

 

 

;

 

 

 

 

 

V =

nX0X

1

nX0DX nX0X

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e b

 

1

 

 

 

 

 

 

 

 

 

1

 

 

 

n e

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

1

 

1

 

 

 

1

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

n

X0X

1

 

 

 

n

i=1 xixi0e~i2!

n

X0X

 

 

 

1

 

 

 

 

 

 

1

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

1

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

n

X0X

 

 

 

 

 

 

n

i=1 (1 hii) 2 xixi0e^i2!

n

X0X

;

 

and

 

 

 

 

 

nX0X

 

 

 

nX0DX nX0X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V =

1

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

1

 

 

 

 

 

 

 

 

 

1

 

 

 

n

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

1

 

1

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

n

X0X

1

 

 

 

n

i=1 xixi0ei2!

n

X0X

 

 

 

1

 

 

 

 

 

 

1

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

1

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

n

X0X

 

 

 

 

 

 

n

i=1 (1 hii) 1 xixi0e^i2!

n

X0X

:

 

 

 

b b

e b

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The estimators

V

; V ;

and V are often called robust, heteroskedasticity-consistent, or

b heteroskedasticity-robust covariance matrix estimators. The estimator V b was …rst developed

by Eicker (1963), and introduced to econometrics by White (1980), and is sometimes called the

CHAPTER 5. LEAST SQUARES REGRESSION

101

Eicker-White or White covariance matrix estimator1. The estimator V was introduced by

 

 

 

 

 

 

 

 

 

 

 

the estimator V

 

was

Andrews (1991) based on the principle of leave-one-out cross-validation, ande b

 

 

 

 

 

Horn, Horn and Duncan (1975) as a reduced-bias covariance matrix estimator.

 

 

introduced by

 

2

> (1 hii)

1

 

 

 

 

 

 

 

 

b

 

 

Since (1 hii)

 

 

> 1 it is straightforward to show that

 

 

 

 

 

 

 

 

 

V

 

 

 

< V :

 

(5.22)

 

 

 

 

 

< V

 

 

 

 

 

 

b

 

 

 

b

 

b

 

 

 

 

(See Exercise 5.7.) The inequality

A < Bbwhen

applied to matrices means that the matrix

B

 

A

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

is positive de…nite

In general, the bias of the estimators V ; V and V ; is quite complicated, but they greatly

simplify under the assumption of

homoskedasticity (5.3). For example, using (5.14),

 

 

 

 

b b

e b

b

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

1

 

 

X

 

 

 

 

 

1

 

 

 

1

 

 

 

 

 

 

1 n

 

 

 

 

 

 

 

 

 

E V

j X =

 

n

X0X

1

 

n

i=1 xixi0E e^i2 j X !

n

X0X

 

1

 

b b

1

 

 

 

1 n

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

n

X0X

1

 

n

i=1 xixi0 (1 hii) 2!

n

X0X

1

 

1

 

 

1

 

 

1

 

1 1

n

 

 

 

 

 

 

 

X0X

 

2

 

 

X0X

 

 

 

X

!

 

X0X

 

 

=

n

 

n

 

n

i=1 xixi0hii

n

2

 

 

1

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X0X

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

V :

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This calculation shows that V is biased towards zero.

 

 

 

 

 

is

Similarly, (again under

homoskedasticity) we can calculate that V

 

b b

 

 

 

 

 

 

 

 

 

 

 

 

 

speci…cally

 

 

 

 

1

 

 

1

e b

 

 

 

 

E

V

j X

 

 

X0X

 

 

2

 

 

 

 

 

n

 

 

 

 

while the estimator

 

is unbiased e b

 

 

 

 

 

 

 

 

 

 

 

V

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

b

 

 

 

 

 

 

 

1

 

 

 

 

 

 

E

V

j X =

1

X0X

 

 

2:

 

 

 

 

 

n

 

 

 

 

 

biased away from zero,

(5.23)

(5.24)

(See Exercise 5.8.)

It might seem rather odd to compare the bias of heteroskedasticity-robust estimators under the assumption of homoskedasticity, but it does give us a baseline for comparison.

0

We have introduced four covariance matrix estimators, V ; V ; V ; and V : Which should

you use? The classic estimator V

0

is typically a poor choice, as it is only valid under the unlikely

 

 

b b

b b

e b

b

b b

homoskedasticity restriction. For this reason it is not typically used in contemporary econometric research. Of the three robust estimators, V is the most commonly used, as it is the most

straightforward and familiar. However, V

and (in particular) V

 

are preferred based on their

b b

 

 

 

 

 

 

regression packages set the classic estimator V 0 as the

improved bias.

Unfortunately, standard e b

 

b

 

 

e b

 

 

b

 

 

 

example, in

default. As V and V are simple to implement, this should not be a barrier. For

b b

STATA, V b is implemented by selecting “Robust”standard errors and selecting the bias correction option “1=(1 h)”or using the vce(hc2) option.

 

 

n

1 Often, this estimator is rescaled by multiplying by the ad hoc bias adjustment

 

in analogy to the bias-

 

corrected error variance estimator.

n k

CHAPTER 5. LEAST SQUARES REGRESSION

102

5.9Standard Errors

A variance estimator such as V is an estimate of the variance of the distribution of . A

more easily interpretable measure of spread is its square root – the standard deviation. This is

so important when discussing the distributionb b

of parameter estimates, we have a special nameb for

estimates of their standard deviation.

b

De…nition 5.9.1 A standard error s( ) for an real-

b

valued estimator is an estimate of the standard deviation

b of the distribution of :

When is a vector with estimate and covariance matrix estimate n 1V , standard errors

for individual elements are the square

roots of the diagonal elements of n

 

1V

 

: That is,

b

 

= n 1=2r

V

jj:

 

b b

s(^j) = rn 1V

^j

 

b b

 

 

 

b

 

 

h b b i

 

 

 

 

 

 

As we discussed in the previous section, there are multiple possible covariance matrix estimators, so standard errors are not unique. It is therefore important to understand what formula and method is used by an author when studying their work. It is also important to understand that a particular standard error may be relevant under one set of model assumptions, but not under another set of assumptions.

To illustrate the computation of the covariance matrix estimate and standard errors, we return to the log wage regression (4.9) of Section 4.4. We calculate that s2 = 0:215 and

 

 

 

 

 

 

 

 

 

 

 

 

 

0:208 3:200

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

3:200 49:961

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Therefore the homoskedastic and

White covariance matrix estimates are

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

and

 

 

V 0

=

 

 

1

 

 

15:426

 

1

0:215 =

10:387 0:659

 

 

 

 

 

 

b b

 

 

15:426

 

243

 

 

 

 

 

 

0:659

 

0:043

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V =

 

1

15:426

1

 

0:208

3:200

 

 

 

1

15:426

 

1

 

7:092

0:445

:

 

 

 

 

 

 

 

 

=

 

b

b

 

15:426

243

 

 

 

 

3:200

49:961

 

15:426

 

243

 

 

 

 

0:445

0:029

 

 

 

 

 

 

 

 

The standard errors are the square roots of the diagonal elements of these matrices. For example,

 

 

 

 

 

 

 

 

 

 

 

 

^

 

 

 

 

 

 

 

 

the White standard errors for 0 are

 

 

 

 

 

 

 

 

 

is :029=61 = :022: A

 

7:092=61 = 0:341 and that for 1

conventional format to write the estimated equation with standard errors is

 

b

p

 

 

 

 

 

 

 

 

 

 

 

 

 

p

\

 

 

 

 

 

 

0:156

Education:

 

 

 

 

 

 

log(W age) = 0:626 +

 

 

 

 

 

 

 

 

 

(:341)

 

(:022)

 

 

 

 

 

 

 

 

 

 

 

 

Alternatively our standard errors could be calculated using V

or

 

 

 

: We report the four

V

 

possible standard errors in the following table

 

 

 

e b

 

b

 

q

 

 

 

 

 

 

 

 

 

 

 

 

n 1V 0

 

 

 

 

 

 

 

 

 

 

n 1V

 

n 1V

n 1

 

 

 

 

V

Intercept

0.412

b b

0.341

b b

0.361

 

0.351

 

 

 

b

q

q

e b

q

 

 

Education

0.026

 

 

0.022

 

0.023

 

0.022

 

 

 

 

 

 

 

The homoskedastic standard errors are noticably di¤erent than the others, but the three robust standard errors are quite close to one another.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]