Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

83

and thus

b

( i)

=X0X xix0i 1 X0y xiyi

=X0X 1 X0y X0X 1 xiyi

 

X0X

 

 

x y

 

 

 

1 x

 

 

 

 

 

 

0

 

 

 

 

 

0

 

 

 

 

 

 

+ (1

 

h

 

) 1

 

X0X

 

x0

 

X0X

 

1

 

X0y

 

x

y

i

 

 

 

 

 

 

ii

 

 

 

 

 

 

 

 

 

 

i

 

 

i

 

 

 

 

 

 

 

 

 

i

 

 

 

 

 

 

 

 

 

 

1

1

 

 

 

 

 

1

 

 

 

 

) 1

 

 

 

 

1 x

 

 

 

 

 

=

 

 

 

 

 

 

 

 

i

i

+ (1

 

h

 

X X

 

 

i

x

 

 

h y

i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ii

 

 

 

 

 

 

 

 

 

i

 

 

 

ii

 

 

(1

 

h

 

)

 

X

X x

 

 

(1

 

h

) y

i xi0 +bhiiyi

 

=

b

 

 

 

 

 

ii

 

 

1

 

 

0

 

 

1

 

 

i

 

 

ii

 

 

 

 

 

b

 

(1

 

h

 

)

 

 

 

 

 

 

 

 

e^

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

ii

 

X

X x

i

 

 

 

 

 

 

 

 

b

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

i

 

 

 

 

 

 

 

 

 

 

 

 

 

the third equality making the substitutions = (X0X) 1

X0y and hii = x0

(X0X) 1 xi; and the

remainder collecting terms.

b

i

 

4.12In‡uential Observations

Another use of the leave-one-out estimator is to investigate the impact of in‡uential observations, sometimes called outliers. We say that observation i is in‡uential if its omission from the sample induces a substantial change in a parameter of interest. From (4.32)-(4.33) we know that

b b

=

(1 h ) 1 X X 1 x e^

 

=

 

i i

 

i i

(4.34)

( i)

 

X0Xii 1 x e~ :0

 

 

 

 

 

 

By direct calculation of this quantity for each observation i; we can directly discover if a speci…c observation i is in‡uential for a coe¢ cient estimate of interest.

For a general assessment, we can focus on the predicted values. The di¤erence between the full-sample and leave-one-out predicted values is

y^

y~ = x0

x0

i

i

i b

i b( i)

=x0i X0X 1 xie~i

=hiie~i

which is a simple function of the leverage values hii and prediction errors e~i: Observation i is in‡uential for the predicted value if jhiie~ij is large, which requires that both hii and je~ij are large.

One way to think about this is that a large leverage value hii gives the potential for observation i to be in‡uential. A large hii means that observation i is unusual in the sense that the regressor xi is far from its sample mean. We call an observation with large hii a leverage point. A leverage point is not necessarily in‡uential as the latter also requires that the prediction error e~i is large.

To determine if any individual observations are in‡uential in this sense, a large number of diagnostic statistics have been proposed (some names include DFITS, Cook’s Distance, and Welsch Distance) but as they are not based on statistical theory it is unclear if they are useful for practical work. Probably the most relevant measure is the change in the coe¢ cient estimates given in (4.34). The ratio of these changes to the coe¢ cient’s standard error is called its DFBETA, and is a postestimation diagnostic available in STATA. While there is no magic threshold, the concern is whether or not an individual observation meaningfully changes an estimated coe¢ cient of interest.

For illustration, consider Figure 4.2 which shows a scatter plot of random variables (yi; xi). The 25 observations shown with the open circles are generated by xi U[1; 10] and yi N(xi; 4): The 26’th observation shown with the …lled circle is x26 = 9; y26 = 0: (Imagine that y26 = 0 was incorrectly recorded due to a mistaken key entry.) The Figure shows both the least-squares …tted

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

84

line from the full sample and that obtained after deletion of the 26’th observation from the sample. In this example we can see how the 26’th observation (the “outlier”) greatly tilts the least-squares …tted line towards the 26’th observation. In fact, the slope coe¢ cient decreases from 0.97 (which is close to the true value of 1.00) to 0.56, which is substantially reduced. Neither y26 nor x26 are unusual values relative to their marginal distributions, so this outlier would not have been detected from examination of the marginal distributions of the data. The change in the slope coe¢ cient of0:41 is meaningful and should raise concern to an applied economist.

y

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

leave−one−out OLS

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

OLS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

● ●

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

● ●

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

4

6

 

8

10

 

 

 

 

 

 

 

 

x

 

 

 

 

 

 

 

Figure 4.2: Impact of an in‡uential observation on the least-squares estimator

If an observation is determined to be in‡uential, what should be done? As a common cause of in‡uential observations is data entry error, the in‡uential observations should be examined for evidence that the observation was mis-recorded. Perhaps the observation falls outside of permitted ranges, or some observables are inconsistent (for example, a person is listed as having a job but receives earnings of $0). If it is determined that an observation is incorrectly recorded, then the observation is typically deleted from the sample. This process is often called “cleaning the data”. The decisions made in this process involve an fair amount of individual judgement. When this is done it is proper empirical practice to document such choices. (It is useful to keep the source data in its original form, a revised data …le after cleaning, and a record describing the revision process. This is especially useful when revising empirical work at a later date.)

It is also possible that an observation is correctly measured, but unusual and in‡uential. In this case it is unclear how to proceed. Some researchers will try to alter the speci…cation to properly model the in‡uential observation. Other researchers will delete the observation from the sample. The motivation for this choice is to prevent the results from being skewed or determined by individual observations, but this practice is viewed skeptically by many researchers who believe it reduces the integrity of reported empirical results.

4.13Measures of Fit

When a least-squares regression is reported in applied economics, it is common to see a summary measure of …t, measuring how well the regressors explain the observed variation in the dependent variable.

Some common summary measures are based on scaled or transformed estimates of the mean-

squared error 2

n

n

: These include the sum of squared errors Pi=1 e^i2

; the sample variance n 1 Pi=1 e^i2 =

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

 

 

 

 

 

85

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^2; the root mean squared error n 1

 

n

 

 

 

(sometimes called the standard error of the re-

 

i=1 e^2

 

), and the mean predictionq error ~2

=i

1

 

n

e~2.

 

 

 

 

 

 

gression

P

 

 

 

 

 

n

 

i=1

 

i

 

 

 

 

or

:

A related and commonly reported statistic is the

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pcoe¢ cient of determination

 

R-squared

 

 

 

n

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

2

 

 

 

R2 =

 

(^yi y)

 

 

 

 

^

 

 

 

i=1

 

 

= 1

 

 

 

 

 

 

n

 

 

 

 

2

 

 

2

 

 

 

 

 

 

(yi

 

 

y)

 

 

 

 

 

 

 

 

 

 

Pi=1

 

 

 

 

 

 

 

^y

 

 

where

P

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^y2

1

 

 

n

(yi

 

)2

 

 

 

 

 

 

 

 

=

 

 

 

 

 

 

y

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

is the sample variance of yi: R2 can be viewed as an estimator of the population parameter

2 = var (x0i ) = 1 2 var(yi) 2y

where 2y = var(yi): A high 2 means that forecasts of y using x0 will be quite accurate relative to the unconditional mean. In this sense R2 can be a useful summary measure for an out-of-sample forecast or policy experiment.

An alternative estimator of 2 proposed by Theil called R-bar-squared or adjusted R2 is

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

2 = 1

 

 

(n 1)

i=1 e^i2

 

 

 

 

 

 

 

 

 

R

 

 

)2

:

 

 

 

 

 

 

 

 

(n k)

n

(yi

 

 

 

 

 

 

 

 

 

 

 

 

y

 

 

 

 

 

 

 

 

 

 

 

 

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

iP

 

 

 

 

 

 

 

 

2

 

 

 

 

 

2

 

P

 

 

 

 

 

2

 

Theil’s estimator R

is better estimator of

 

 

 

 

 

 

because it is a

 

 

than the unadjusted estimator R

 

ratio of bias-corrected variance estimates.

Unfortunately, the frequent reporting of R2 and R2 seems to have led to exaggerated beliefs regarding their usefulness. One mistaken belief is that R2 is a measure of “…t”. This belief is incorrect, as an incorrectly speci…ed model can still have a reasonably high R2. For example, suppose the truth is that xi N(0; 1) and yi = xi +x2i : If we regress yi on xi (incorrectly omitting x2i ); the best linear predictor is yi = 1+ xi +ei where ei = x2i 1: This is a misspeci…ed regression, as the true relationship is deterministic! You can also calculate that the population 2 = =(2 + ) which can be arbitrarily close to 1 if is large. For example, if = 8; then R2 ' 2 = :8; or if= 18 then R2 ' 2 = :9. This example shows that a regression with a high R2 can actually have poor …t.

Another mistaken belief is that a high R2 is important in order to justify interpretation of the regression coe¢ cients. This is mistaken as there is no direct association between the level of R2 and the “correctness” of a regression, the accuracy of the coe¢ cient estimates, or the validity of statistical inferences based on the estimated regression. In contrast, even if the R2 is quite small, accurate estimates of regression coe¢ cients is quite possible when sample sizes are large.

The bottom line is that while R2 and R2 have appropriate uses, their usefulness should not be exaggerated.

Henri Theil

Henri Theil (1924-2000) of Holland invented R2 and two-stage least squares, both of which are routinely seen in applied econometrics. He also wrote an early and in‡uential advanced textbook on econometrics (Theil, 1971).

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

86

4.14 Normal Regression Model

The normal regression model is the linear regression model under the restriction that the error ei is independent of xi and has the distribution N 0; 2 : We can write this as

ei j xi N 0; 2 :

This assumption implies yi j xi N x0i ; 2 :

Normal regression is a parametric model, where likelihood methods can be used for estimation, testing, and distribution theory.

The log-likelihood function for the normal regression model is

 

n

 

 

 

exp

 

 

 

!

 

X

 

1

 

1

log L( ; 2) =

i=1 log

 

(2 2)1=2

2 2

 

yi xi0 2

 

n

2 2

1

 

 

 

 

 

=

 

 

log

 

SSEn( ):

 

 

2

2 2

 

 

The maximum likelihood estimator (MLE) ( mle; ^2mle) maximizes log L( ; 2): Since the latter is

a function of only through the sum of

 

squared errors SSE

 

 

( ); maximizing the likelihood is

 

b

 

 

 

 

 

 

 

 

 

n

 

 

 

 

identical to minimizing SSEn( ). Hence

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

mle = ols;

 

 

 

 

 

 

 

 

 

 

 

the MLE for equals the OLS estimator:

Due to this equivalence, the least squares estimator

ols

 

b

b

 

 

 

 

 

 

 

 

 

 

 

b

often called the MLE.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

is We can also …nd the MLE for 2

 

Plugging into the log-likelihood we obtain

 

 

 

b

:

 

 

 

n b

 

 

 

 

 

1

 

 

n

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

log L ; 2 =

2

log

 

2 2

 

 

2 2

 

i=1 e^i :

 

 

Maximization with respect to 2 yields the …rst-order condition

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

@

 

 

 

 

 

 

n

 

 

 

 

 

1

 

 

X

 

 

 

 

@ 2

log L ; ^2

=

2^2

+

2

 

^2

 

2

i=1 e^i2 = 0:

 

 

Solving for ^2 yields the MLE for 2

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

^mle2 =

1

 

e^i2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n =1

 

 

 

 

 

 

 

 

 

 

 

 

 

which is the same as the moment estimator (4.14).

b

It may seem surprising that the MLE mle is numerically equal to the OLS estimator, despite emerging from quite di¤erent motivations. It is not completely accidental. The least-squares estimator minimizes a particular sample loss function – the sum of squared error criterion – and most loss functions are equivalent to the likelihood of a speci…c parametric distribution, in this case the normal regression model. In this sense it is not surprising that the least-squares estimator can be motivated as either the minimizer of a sample loss function or as the maximizer of a likelihood function.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

87

Carl Friedrich Gauss

The mathematician Carl Friedrich Gauss (1777-1855) proposed the normal regression model, and derived the least squares estimator as the maximum likelihood estimator for this model. He claimed to have discovered the method in 1795 at the age of eighteen, but did not publish the result until 1809. Interest in Gauss’ approach was reinforced by Laplace’s simultaneous discovery of the central limit theorem, which provided a justi…cation for viewing random disturbances as approximately normal.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

88

Exercises

Exercise 4.1 Let y be a random variable with = Ey and 2 = var(y): De…ne

g y; ; 2

=

 

y

:

 

 

 

 

(y )2 2

 

n

 

Let (^; ^2) be the values such that

 

n(^; ^2) = 0 where

 

n(m; s) = n 1

g (yi; m; s) : Show that

g

g

^ and ^2 are the sample mean and variance.

 

 

 

 

 

Pi=1

 

Exercise 4.2 Consider the OLS regression of the n 1 vector y on the n k matrix X. Consider an alternative set of regressors Z = XC; where C is a k k non-singular matrix. Thus, each column of Z is a mixture of some of the columns of X: Compare the OLS estimates and residuals from the regression of y on X to the OLS estimates from the regression of y on Z:

Exercise 4.3 Using matrix algebra, show X0e^ = 0:

Exercise 4.4 Let e^ be the OLS residual from a regression of y on X = [X1 X2]. Find X02e^:

Exercise 4.5 Let e^ be the OLS residual from a regression of y on X: Find the OLS coe¢ cient from a regression of e^ on X:

Exercise 4.6 Let y^ = X(X0X) 1X0y: Find the OLS coe¢ cient from a regression of y^ on X:

Exercise 4.7 Show that if X = [X1 X2] then P X1 = X1:

Exercise 4.8 Show (4.22), that hii in (4.21) sum to k: (Hint: Use (4.20).)

Exercise 4.9 Show that M is idempotent: MM = M:

Exercise 4.10 Show that tr M = n k:

Exercise 4.11 Show that if X = [X1 X2] and X01X2 = 0 then P = P 1 + P 2.

Exercise 4.12 A dummy variable takes on only the values 0 and 1. It is used for categorical data, such as an individual’s gender. Let d1 and d2 be vectors of 1’s and 0’s, with the i0th element of d1 equaling 1 and that of d2 equaling 0 if the person is a man, and the reverse if the person is a woman. Suppose that there are n1 men and n2 women in the sample. Consider …tting the following three equations by OLS

y

= + d1

1 + d2 2 + e

(4.35)

y

=

d1 1 + d2 2 + e

(4.36)

y

=

+ d1

+ e

(4.37)

Can all three equations (4.35), (4.36), and (4.37) be estimated by OLS? Explain if not.

(a)Compare regressions (4.36) and (4.37). Is one more general than the other? Explain the relationship between the parameters in (4.36) and (4.37).

(b)Compute 0d1 and 0d2; where is an n 1 is a vector of ones.

(c)Letting = ( 1 2)0; write equation (4.36) as y = X + e: Consider the assumption E(xiei) = 0. Is there any content to this assumption in this setting?

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

89

Exercise 4.13 Let d1 and d2 be de…ned as in the previous exercise.

(a) In the OLS regression

y = d1^1 + d2^2 + u^;

show that ^1 is the sample mean of the dependent variable among the men of the sample (y1), and that ^2 is the sample mean among the women (y2).

(b) Let X (n k) be an additional matrix of regressions. Describe in words the transformations

y = y d1y1 d2y2

 

 

 

 

 

 

X = X d1

X

1 d2

X

2:

 

 

 

 

(c) Compare from the OLS regresion

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

y = X + e~

 

 

 

 

with from the OLS regression

 

 

 

e

 

 

 

 

b

 

 

 

 

 

 

y = d1

^1 + d2 ^2 + X + e^:

 

 

 

 

Exercise 4.14

Let

 

= (X0 X

 

)

1

X0

y

 

 

 

b

 

is n

 

1

n

n

 

n

denote the OLS estimate when y

n

 

 

n

 

 

n

 

 

 

 

 

 

 

 

b

n k. A new observation (yn+1; xn+1) becomes available. Prove that the OLS estimate using this additional observation is

and Xn is computed

b

b

1

 

 

b

n+1 = n +

1 + xn0 +1 (Xn0 Xn) 1 xn+1

 

Xn0 Xn 1 xn+1 yn+1 xn0 +1 n :

Exercise 4.15 Prove that R2 is the square of the sample correlation between y and y^:

Exercise 4.16 Show that ~2 ^2: Is equality possible?

b b

Exercise 4.17 For which observations will ( i) = ?

Exercise 4.18 The data …le cps85.dat contains a random sample of 528 individuals from the 1985 Current Population Survey by the U.S. Census Bureau. The …le contains observations on nine variables, listed in the …le cps85.pdf.

V1 = education (in years)

V2 = region of residence (coded 1 if South, 0 otherwise)

V3 = (coded 1 if nonwhite and non-Hispanic, 0 otherwise)

V4 = (coded 1 if Hispanic, 0 otherwise)

V5 = gender (coded 1 if female, 0 otherwise)

V6 = marital status (coded 1 if married, 0 otherwise)

V7 = potential labor market experience (in years)

V8 = union status (coded 1 if in union job, 0 otherwise)

V9 = hourly wage (in dollars)

Estimate a regression of wage yi on education x1i, experience x2i, and experienced-squared x3i = x22i (and a constant). Report the OLS estimates.

Let e^i be the OLS residual and y^i the predicted value from the regression. Numerically calculate the following:

(a)

n

Pi=1 e^i

CHAPTER 4.

THE ALGEBRA OF LEAST SQUARES

90

 

Pn

 

 

 

(b)

n

 

 

 

i=1 x1ie^i

 

(c)

i=1 x2ie^i

 

(d)

Pn

x2

e^i

 

 

i=1

1i

 

 

 

Pn

 

 

 

(e)

Pn

x2

e^i

 

 

i=1

2i

 

 

(f)

Pn

2

 

 

i=1 y^ie^i

 

(g)

P2i=1 e^i

 

 

(h)

R

 

 

 

Are these calculations consistent with the theoretical properties of OLS? Explain.

Exercise 4.19 Using the data from the previous problem, restimate the slope on education using the residual regression approach. Regress yi on (1; x2i; x22i), regress x1i on (1; x2i; x22i), and regress the residuals on the residuals. Report the estimate from this regression. Does it equal the value from the …rst OLS regression? Explain.

In the second-stage residual regression, (the regression of the residuals on the residuals), calculate the equation R2 and sum of squared errors. Do they equal the values from the initial OLS regression? Explain.

Chapter 5

Least Squares Regression

5.1 Introduction

In this chapter we investigate some …nite-sample properties of least-squares applied to a random sample in the the linear regression model. Throughout this chapter we maintain the following.

Assumption 5.1.1 Linear Regression Model

 

The observations (yi; xi) come from a random sample and satisfy the linear

regression equation

 

 

 

yi

=

xi0 + ei

(5.1)

E(ei j xi)

=

0:

(5.2)

The variables have …nite second moments

 

Eyi2 < 1;

 

Ekxik2 < 1;

 

and an invertible design matrix

 

 

 

Qxx = E xixi0 > 0:

 

We will consider both the general case of heteroskedastic regression, where the conditional

variance

ei2 j xi = 2(xi) = i2

E

is unrestricted, and the specialized case of homoskedastic regression, where the conditional variance is constant. In the latter case we add the following assumption.

Assumption 5.1.2 Homoskedastic Linear Regression Model

In addition to Assumption 5.1.1,

E ei2 j xi = 2(xi) = 2

(5.3)

is independent of xi:

91

CHAPTER 5. LEAST SQUARES REGRESSION

92

5.2Mean of Least-Squares Estimator

In this section we show that the OLS estimator is unbiased in the linear regression model. Under (5.1)-(5.2) note that

(y

X) =

0

 

(yi...

X)

1

=

0

 

(yi...

xi)

1

=

0 x0...

1

= X :

(5.4)

E j

 

B

E

...j

 

C

 

B

E

...j

 

C

 

B

i...

C

 

 

 

B

 

C

 

B

 

C B

C

 

 

 

 

@

 

 

 

A

 

@

 

 

 

A

 

@

 

A

 

 

Similarly

 

0

 

(ei...

 

1

 

0

(ei...

 

(e X) =

 

X)

=

E j

B

E

...j

 

C

 

B

E ...

B

 

C

 

B

 

 

@

 

 

 

A

 

@

 

j

 

C

 

 

 

xi)

C

= 0:

(5.5)

 

1

 

 

A

 

 

By (4.16), conditioning on X, the linearity of expectations, (5.4), and the properties of the matrix inverse,

 

b

 

= E

 

 

 

1 X0y j X

 

 

E j X

X0X

 

 

 

 

= X0X

 

1 X0E(y

X)

 

 

 

 

 

=

X0X 1 X0X j

 

 

 

 

 

 

 

=

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

Applying the law of iterated expectations to E j X = , we …nd that

 

 

E = E E j X = :

 

 

 

 

Another way to calculate the

same result is as follows. Insert y = X + e into the formula

b

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

to obtain

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4.16) for b

=

X0X

 

1

X0 (X + e)

 

 

 

 

 

 

 

b

X0X

0

 

 

0

0

 

0

 

 

 

 

0

 

 

=

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

1 X e:

 

 

 

 

 

 

(5.6)

 

=

+ X X

 

 

 

 

 

 

This is a useful linear decomposition of the estimator into the true parameter and the stochastic component (X X) 1 X e:

Using (5.6),0

conditioning0

on X, and (5.5),

 

b

 

 

 

 

b

=

E

 

1 X0e j X

 

E j X

X0X

 

 

 

 

= X0X 1 X0E(e j X)

 

 

 

=

0:

 

 

 

 

 

 

 

 

 

Using either derivation, we have shown the following theorem.

Theorem 5.2.1 Mean of Least-Squares Estimator

In the linear regression model (Assumption 5.1.1)

 

 

 

 

(5.7)

 

 

 

X =

and

E

b j

 

 

 

E(b

(5.8)

 

 

) = :

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]