Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

123

Theorem 6.10.1 Under Assumption 1.5.1 and Assumption 6.4.1, if h( ) is continuously di¤erentiable at the true value of ; then as n ! 1;

V

p

V

 

:

b

!

 

 

6.11t statistic

Let = h( ) : Rk ! R be any parameter of interest (for example, could be a single element

its estimate and s( ) its asymptotic standard error. Consider the statistic

of ), b

b

b

tn( ) = b : (6.31) s( )

Di¤erent writers have called (6.31) a t-statistic, a t-ratio, a z-statistic or a studentized statistic. We won’t be making such distinctions and will refer to tn( ) as a t-statistic or a t-ratio. We also often suppress the parameter dependence, writing it as tn: The t-statistic is a simple function of the estimate, its standard error, and the parameter.

d

Theorem 6.11.1 tn( ) ! N (0; 1)

Thus the asymptotic distribution of the t-ratio tn( ) is the standard normal. Since this distribution does not depend on the parameters, we say that tn( ) is asymptotically pivotal. In special cases (such as the normal regression model, see Section 4.14), the statistic tn has an exact t distribution, and is therefore exactly free of unknowns. In this case, we say that tn is exactly pivotal. In general, however, pivotal statistics are unavailable and we must rely on asymptotically pivotal statistics.

6.12Con…dence Intervals

A con…dence interval Cn is an interval estimate of 2 R: It is a function of the data and hence is random. It is designed to cover with high probability. Either 2 Cn or 2= Cn: Its coverage probability is Pr( 2 Cn). The convention is to design con…dence intervals to have coverage probability approximately equal to a pre-speci…ed target, typically 90% or 95%, or more generally written as (1 )% for some 2 (0; 1): By reporting a (1 )% con…dence interval Cn; we are stating that the true lies in Cn with (1 )% probability across repeated samples.

There is not a unique method to construct con…dence intervals. For example, a simple (yet

silly) interval is

 

 

with probability 1

 

 

C =

R

 

 

n

 

with probability

 

 

 

 

distribution, Pr(

2

C ) = 1

 

; so this con…dence interval

By construction, if has a continuous b

 

n

 

has perfect coverage, but Cn is uninformative about : This is not a useful con…dence interval.

When we have anb asymptotically normal parameter estimate with standard error s( ); it turns

out that a generally reasonable con…dence interval for takes

the form

b

b

b

b

b

b

(6.32)

Cn = h c s( );

+ c s( )i

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

124

where c > 0 is a pre-speci…ed constant. This con…dence interval is symmetric about the point

b b estimate ; and its length is proportional to the standard error s( ):

Equivalently, Cn is the set of parameter values for such that the t-statistic tn( ) is smaller (in absolute value) than c; that is

 

 

 

 

 

 

 

 

 

b

 

 

C =

f

: t

( )

j

c

g

= : c

 

 

c :

n

j n

 

 

(

 

s( )

)

 

 

 

 

 

 

 

 

 

b

 

 

The coverage probability of this con…dence interval is

Pr ( 2 Cn) = Pr (jtn( )j c)

which is generally unknown, but we can approximate the coverage probability by taking the asymptotic limit as n ! 1: Since tn( ) is asymptotically standard normal (Theorem 6.11.1), it follows that as n ! 1 that

Pr ( 2 Cn) ! Pr (jZj c) = (c) ( c)

where Z N (0; 1) and (u) = Pr (Z u) is the standard normal distribution function. We call this the asymptotic coverage probability, and it is a function only of c:

As we mentioned before, the convention is to design the con…dence interval to have a prespeci…ed asymptotic coverage probability 1 ; typically 90% or 95%. This means selecting the constant c so that

(c) ( c) = 1 :

E¤ectively, this makes c a function of ; and can be backed out of a normal distribution table. For example, = 0:05 (a 95% interval) implies c = 1:96 and = 0:1 (a 90% interval) implies c = 1:645: Rounding 1.96 to 2, we obtain the most commonly used con…dence interval in applied econometric

practice h i

b b b b

Cn = 2s( ); + 2s( ) :

This is a useful rule-of thumb. This asymptotic 95% con…dence interval Cn is simple to compute and can be roughly calculated from tables of coe¢ cient estimates and standard errors. (Technically, it is an asymptotic 95.4% interval, due to the substitution of 2.0 for 1.96, but this distinction is meaningless.)

Con…dence intervals are a simple yet e¤ective tool to assess estimation uncertainty. When reading a set of empirical results, look at the estimated coe¢ cient estimates and the standard errors. For a parameter of interest, compute the con…dence interval Cn and consider the meaning of the spread of the suggested values. If the range of values in the con…dence interval are too wide to learn about ; then do not jump to a conclusion about based on the point estimate alone.

6.13Regression Intervals

In the linear regression model the conditional mean of yi given xi = x is

m(x) = E(yi j xi = x) = x0 :

In some cases, we want to estimate m(x) at a particular point x: Notice that this is a (linear)

function of : Letting h( ) = x0 and = h( ); we see that m(x) = = x0

and H

 

= x; so

q

 

 

 

 

b

b

 

 

b

b

 

 

 

 

s( ) =

n 1x0V x: Thus an asymptotic 95% con…dence interval for m(bx) is

 

 

 

x0

b

 

 

 

 

 

 

 

b

 

 

 

 

 

2qn 1x0V x :

 

 

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

125

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.7: Wage on Education Regression Intervals

It is interesting to observe that if this is viewed as a function of x; the width of the con…dence set is dependent on x:

To illustrate, we return to the log wage regression (4.9) of Section 4.4. The estimated regression equation is

\ 0b

log(W age) = x = 0:626 + 0:156x:

where x = Education. The White covariance matrix estimate is

V =

7:092

0:445

b b

0:445

0:029

and the sample size is n = 61: Thus the 95% con…dence interval for the regression takes the form r

0:626 + 0:156x 2

1

(7:092 0:89x + 0:029x2) :

61

The estimated regression and 95% intervals are shown in Figure 6.7. Notice that the con…dence bands take a hyperbolic shape. This means that the regression line is less precisely estimated for very large and very small values of education.

Plots of the estimated regression line and con…dence intervals are especially useful when the regression includes nonlinear terms. To illustrate, consider the log wage regression (4.10) which includes experience and its square.

\

2

(6.33)

log(W age) = 1:06 + 0:116 education + 0:010 experience 0:014 experience =100

and has n = 2454 observations. We are interested in plotting the regression estimate and regression intervals as a function of experience. Since the regression also includes education, in order to plot the estimates in a simple graph we need to …x education at a speci…c value. We select education=12. This only a¤ects the level of the estimated regression, since education enters without an interaction.

De…ne the points of evaluation

 

 

 

 

B

1

C

 

2 x

x =

0

12

1

 

@

 

A

 

B x =100

C

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

126

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.8: Wage on Experience Regression Intervals

 

 

 

 

 

 

where x =experience. The covariance matrix estimate is

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

= 0

22:92

1:0601

0:56687

0:86626

1

 

 

 

 

 

 

 

 

 

 

V

 

1:0601

0:06454

:0080737 :0066749

:

 

 

 

 

 

 

 

 

 

 

 

@

0:56687

 

 

 

 

 

 

 

:075583

A

 

 

 

 

 

 

 

 

 

 

 

 

B

:0080737

:040736

C

 

 

 

 

 

 

 

 

 

 

b b

B

0:86626

 

:0066749

 

:075583

0:14994

C

 

 

 

 

 

Thus the regression interval for education=12, as a function of x =experience is

 

 

 

1:06 + 0:116 12 + 0:010 x 0:014 x2=100

 

 

 

 

 

 

 

 

 

 

 

 

u

 

 

 

 

 

 

 

 

 

22:92

 

1:0601

0:56687

0:86626

 

1

 

 

1

v

1

 

 

 

 

0

 

 

10

1

 

1

12

x x2=100

1:0601

 

0:06454

:0080737 :0066749

12

 

 

 

 

 

 

 

 

 

 

u

 

 

 

 

 

2

B

 

 

 

 

 

 

 

 

 

CB

 

C

 

t

2454

 

 

 

@

 

0:56687 :0080737

:040736

 

:075583

A@

x

A

50u

 

 

 

 

 

 

 

 

 

 

 

 

u

 

 

 

 

 

 

B

 

 

 

:0066749 :075583

 

2

=100

C

 

 

u

 

 

 

 

 

0:86626

 

 

0:14994

CB x

p

= 2:452 + 0:010 x :00014 x2 100 27:592 3:8304 x + 0:23007 x2 0:00616 x3 + 0:0000611 x4

The estimated regression and 95% intervals are shown in Figure 6.8. The regression interval widens greatly for small and large values of experience, indicating considerable uncertainty about the e¤ect of experience on mean wages for this population. The con…dence bands take a more complicated shape than in Figure 6.7 due to the nonlinear speci…cation.

6.14Quadratic Forms

Let = h( ) : Rk ! Rq be any parameter vector of interest, covariance matrix estimator. Consider the quadratic form

Wn( ) = n

0

V

1

 

:

 

b

b

 

b

 

bb

its estimate and V its

(6.34)

When q = 1 then Wn( ) = tn( )2 is the square of the t-ratio. When q > 1 Wn( ) is typically called a Wald statistic. We are interested in its sampling distribution.

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

127

The asymptotic distribution of Wn( ) is simple to derive given Theorem 6.9.2 and Theorem 6.10.1, which show that

 

 

 

 

 

 

d

 

 

 

 

 

pn

 

 

 

 

 

 

! Z N (0; V )

 

 

 

and

b

 

 

p

 

 

 

 

 

 

 

 

 

V

! V :

 

 

 

 

It follows that

 

 

b

0

 

1

 

 

d

1

 

 

Wn( ) = pn

V

pn

 

Z

(6.35)

 

! Z0V

 

 

 

 

 

b

 

b

 

 

b

 

 

 

 

a quadratic in the normal random vector Z: Here we can appeal to a useful result from probability theory. (See Theorem B.9.3 in the Appendix.)

Theorem 6.14.1 If Z N (0; A) with A > 0; q q; then Z0A 1Z 2q; a chi-square random variable with q degrees of freedom.

The asymptotic distribution in (6.35) takes exactly this form. It follows that Wn( ) converges in distribution to a chi-square random variable.

Theorem 6.14.2 Under Assumption 1.5.1 and Assumption 6.4.1, if h( ) is continuously di¤erentiable at the true value of ; then as n ! 1;

d 2

Wn( ) ! q:

6.15Con…dence Regions

A con…dence region Cn is a generalization of a con…dence interval to the case 2 Rq with q > 1: A con…dence region Cn is a set in Rq intended to cover the true parameter value with a pre-selected probability 1 : Thus an ideal con…dence region has the coverage probability Pr( 2 Cn) = 1 . In practice it is typically not possible to construct a region with exact coverage, but we can calculate its asymptotic coverage.

When the parameter estimate satis…es the conditions of Theorem 6.14.2, a good choice for a con…dence region is the ellipse

Cn = f : Wn( ) c1 g :

with c1 the 1 ’th quantile of the 2q distribution. (Thus Fq(c1 ) = 1 :) These quantiles can be found from a critical value table for the 2q distribution.

Theorem 6.14.2 implies

Pr ( 2 Cn) ! Pr 2q c1 = 1

which shows that Cn has asymptotic coverage (1 )%:

To illustrate the construction of a con…dence region, consider the estimated regression (6.33) of

the model

 

\

2

log(W age) = + 1

education + 2 experience + 3 experience =100:

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

128

Suppose that the two parameters of interest are the percentage return to education 1 = 100 1 and the percentage return to experience for individuals with 10 years experience 2 = 100 2 + 20 3. (We need to condition on the level of experience since the regression is quadratic in experience.) These two parameters are a linear transformation of the regression parameters with point estimates

 

 

b

 

 

0

100

0

0

 

b

=

11:6

;

 

 

 

 

 

 

 

 

=

0

0

100

20

 

0:72

 

 

 

 

 

and have the covariance matrix estimate

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

0

0

1

 

 

 

 

 

V

=

0

100

0

0

V

100

0

 

 

 

 

 

b

 

 

 

 

 

 

 

b b B

0

20

C

 

 

 

 

 

 

 

 

0

0

100

20

 

@

0

100

A

 

 

 

 

 

 

 

 

 

B

C

 

 

 

 

 

 

 

 

 

645:4 67:387

 

 

 

 

 

 

 

 

 

 

 

 

 

=

67:387

165

 

 

 

 

 

 

 

 

 

 

 

with inverse

 

 

1

 

0:0016184

 

0:00066098

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V

 

=

 

0:00066098

0:0063306

 

 

 

 

 

Thus the Wald statistic is

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Wn( ) = n

0 V 1

 

 

 

 

 

 

 

 

 

 

 

 

 

b

11:6b

1

b 0

0:0016184

 

0:00066098

 

11:6

 

1

 

= 2454

 

 

 

 

 

0:00066098

 

 

 

 

 

 

 

 

 

0:72 2

0:0063306

0:72 2

= 3:97 (11:6 1)2 3:2441 (11:6 1) (0:72 2) + 15:535 (0:72 2)2

The 90% quantile of the 22 distribution is 4.605 (we use the 22 distribution as the dimension of is two), so an asymptotic 90% con…dence region for the two parameters is the interior of the ellipse

3:97 (11:6 1)2 3:2441 (11:6 1) (0:72 2) + 15:535 (0:72 2)2 = 4:605

which is displayed in Figure 6.9. Since the estimated correlation of the two coe¢ cient estimates is small (about 0.2) the ellipse is close to circular.

6.16Semiparametric E¢ ciency in the Projection Model

In Section 5.4 we presented the Gauss-Markov theorem, which stated that in the homoskedastic CEF model, in the class of linear unbiased estimators the one with the smallest variance is leastsquares. As we noted in that section, the restriction to linear unbiased estimators is unsatisfactory as it leaves open the possibility that an alternative (non-linear) estimator could have a smaller asymptotic variance. In addition, the restriction to the homoskedastic CEF model is also unsatisfactory as the projection model is more relevant for empirical application. The question remains: what is the most e¢ cient estimator of the projection coe¢ cient (or functions = h( )) in the projection model?

It turns out that it is straightforward to show that the projection model falls in the estimator class considered in Proposition 2.13.2. It follows that the least-squares estimator is semiparametrically e¢ cient in the sense that it has the smallest asymptotic variance in the class of semiparametric estimators of . This is a more powerful and interesting result than the Gauss-Markov theorem.

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

129

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.9: Con…dence Region for Return to Experience and Return to Education

To see this, it is worth rephrasing Proposition 2.13.2 with amended notation. Suppose that a pa-

rameter of interest is = g( n) where = Ezi; for which the moment estimators are b = 1 Pno zi

n i=1

 

= g( ): Let

L2

(g) = F :

Ek

z

k

2 <

1

; g (u) is continuously di¤erentiable at u = z be

and b

b

 

 

 

E

b

the set of distributions for which satis…es the central limit theorem.

b

Proposition 6.16.1 In the class of distributions F 2 L2(g); is semiparametrically e¢ cient for in the sense that its asymptotic variance equals the semiparametric e¢ ciency bound.

b

Proposition 6.16.1 says that under the minimal conditions in which is asymptotically normal,

b then no semiparametric estimator can have a smaller asymptotic variance than .

To show that an estimator is semiparametrically e¢ cient it is su¢ cient to show that it falls

in the class covered by this Proposition. To show that the projection model falls in this class, we write = Qxx1Qxy = g ( ) where = Ezi and zi = (xix0i; xiyi) : The class L2(g) equals the class of distributions n o

L4( ) = F : Ey4 < 1; Ekxk4 < 1; Exix0i > 0 :

Proposition 6.16.2 In the class of distributions F 2 L4( ); the least-

b

squares estimator is semiparametrically e¢ cient for .

The least-squares estimator is an asymptotically e¢ cient estimator of the projection coe¢ cient because the latter is a smooth function of sample moments and the model implies no further

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

130

restrictions. However, if the class of permissible distributions is restricted to a strict subset of L4( ) then least-squares can be ine¢ cient. For example, the linear CEF model with heteroskedastic errors is a strict subset of L4( ); and the GLS estimator has a smaller asymptotic variance than OLS. In this case, the knowledge that true conditional mean is linear allows for more e¢ cient estimation of the unknown parameter.

b b

From Proposition 6.16.1 we can also deduce that plug-in estimators = h( ) are semiparametrically e¢ cient estimators of = h( ) when h is continuously di¤erentiable. We can also deduce that other parameters estimators are semiparametrically e¢ cient, such as ^2 for 2: To see this, note that we can write

2 = E yi xi0 2

 

 

 

 

 

=

2

2E

yixi0 + 0E xixi0

Eyi

=

Qyy

 

Q xQ 1Q

xy

 

 

 

 

 

 

 

y

 

 

xx

 

which is a smooth function of the moments Q

 

; Q

yx

and Q

xx

: Similarly the estimator ^2 equals

 

 

 

 

 

 

yy

 

 

 

 

 

^2 =

1

 

 

n

e^i2

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n =1

 

 

 

 

 

 

 

=

Q

 

 

 

Q Q 1Q

 

 

 

 

 

 

byy byx bxx bxy

 

Since the variables yi2; yix0i and xix0i all have …nite variances when F 2 L4( ); the conditions of Proposition 6.16.1 are satis…ed. We conclude:

Proposition 6.16.3 In the class of distributions F 2 L4( ); ^2 is semiparametrically e¢ cient for 2.

6.17Semiparametric E¢ ciency in the Homoskedastic Regression Model*

In Section 6.16 we showed that the OLS estimator is semiparametrically e¢ cient in the projection model. What if we restrict attention to the classical homoskedastic regression model? Is OLS still e¢ cient in this class? In this section we derive the asymptotic semiparametric e¢ ciency bound for this model, and show that it is the same as that obtained by the OLS estimator. Therefore it turns out that least-squares is e¢ cient in this class as well.

Recall that in the homoskedastic regression model the asymptotic variance of the OLS estimator

b

= Qxx1

 

 

 

for is V 0

2

: Therefore, as described in Section 2.13, it is su¢ cient to …nd a parametric

submodel whose Cramer-Rao bound for estimation of is V 0 : This would establish that V 0 is the semiparametric variance bound and the OLS estimator is semiparametrically e¢ cient for :

Let the joint density of y and x be written as f (y; x) = f

(y

x) f (x) ; the product of the

conditional density of y given x and the marginal densitybof

1x. jNow2consider the parametric

submodel

 

 

 

 

 

 

 

1 + y x0

 

 

 

 

 

 

 

 

 

 

 

f (y; x j ) = f1 (y j x)

x0 = 2

 

f2 (x) :

 

(6.36)

You can check that in this submodel the

marginal density of x is f

 

(x) and the conditional density

 

 

 

 

 

 

2

 

 

 

 

 

of y given x is f1 (y j x)

1 + (y x0 ) (x0 ) = 2

 

: To see that the latter is a valid conditional

density, observe that the

regression assumption implies that

yf

 

 

(y

 

x) dy = x

and therefore

 

 

 

 

 

 

 

 

 

 

R

 

1

 

j

0

 

 

 

 

 

 

 

 

2

 

= 1:

 

 

 

 

 

 

 

 

 

2

 

 

 

(y j x) dy + Z f1

 

 

Z f1 (y j x) 1 + y x0 x0

= dy =

Z f1

(y j x) y x0 dy x0

=

 

x (y x0 ) = 2
1 + (y x0 ) (x0 ) = 2

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

131

In this parametric submodel the conditional mean of y given x is

 

E (y j x) =

Z

yf1 (y j x) 1 + y x0 x0 = 2 dy

 

=

Z

yf1 (y j x) dy + Z

yf1 (y j x) y x0 x0 = 2dy

 

=

Z

yf1 (y j x) dy + Z y x0 2 f1 (y j x) x0 = 2dy

 

 

+ Z y x0 f1 (y j x) dy x0 x0 = 2

 

=

x0 ( + ) ;

 

 

 

using the homoskedasticity assumption

(y x0 )2 f1 (y j x) dy = 2: This means that in this

parametric submodel, the conditional

mean is linear in x and the regression coe¢ cient is ( ) =

 

R

 

 

+ :

We now calculate the score for estimation of : Since

@@ log f (y; x j ) = @@ log 1 + y x0 x0 = 2 =

the score is

s = @@ log f (y; x j 0) = xe= 2:

The Cramer-Rao bound for estimation of (and therefore ( ) as well) is

E ss0 1 = 4E (xe) (xe)0 1 = 2Qxx1 = V 0 :

We have shown that there is a parametric submodel (6.36) whose Cramer-Rao bound for estimation of is identical to the asymptotic variance of the least-squares estimator, which therefore is the semiparametric variance bound.

Theorem 6.17.1 In the homoskedastic regression model, the semiparametric variance bound for estimation of is V 0 = 2Qxx1 and the OLS estimator is semiparametrically e¢ cient.

This result is similar to the Gauss-Markov theorem, in that it asserts the e¢ ciency of the leastsquares estimator in the context of the homoskedastic regression model. The di¤erence is that the Gauss-Markov theorem states that OLS has the smallest variance among the set of unbiased linear estimators, while Theorem 6.17.1 states that OLS has the smallest asymptotic variance among all regular estimators. This is a much more powerful statement.

6.18 Technical Proofs*

Proof of Theorem 6.3.1. Note that

e^i

=

yi xi0

 

 

i0

 

 

 

 

 

=

ei

+ xi0

b

 

 

 

 

 

 

=

e

i

 

x

i0

 

 

 

 

b

:

 

 

 

 

 

 

b

 

 

 

Thus

 

 

 

 

 

 

 

 

xixi0

 

(6.37)

e^i2 = ei2 2eixi0

+ 0

 

b

 

 

 

 

 

b

 

 

 

 

b

 

CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES

132

and

 

 

 

 

 

 

 

 

 

 

 

^2 =

1

 

n

e^i2

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n =1

 

1 n

!

 

 

1 n

!

1

 

n

 

 

0

 

 

X

 

 

X

 

 

X

=

n

i=1 ei2 2

n

i=1 eixi0

+

n

i=1 xixi0

p

 

2

 

 

 

b

 

b

 

 

b

!

 

 

 

 

 

 

 

 

 

 

as n ! 1; the last line using the WLLN and Theorem 6.2.1. Thus ^2 is consistent for 2: Finally, since n=(n k) ! 1 as n ! 1; it follows that as n ! 1;

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

p

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

s2 =

 

 

 

 

 

^2

! 2:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

k

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

…rst show

p

 

 

: Note that

 

 

 

 

 

 

 

 

 

 

 

 

Proof of Theorem 6.8.2. We

 

 

1

 

 

n

b

2!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

 

 

xixi0e^i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

 

n

 

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

n

 

 

 

 

 

1

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

X

 

 

ei2

 

 

 

 

 

 

 

 

(6.38)

 

=

n

 

=1

xixi0ei2 + n

 

 

xixi0

e^i2

 

:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

We now examine each k k sum on the right-hand-side of (6.38) in turn.

 

 

 

 

2

 

 

Take the …rst term on the right-hand-side of (6.38). Since

 

x e2

 

=

 

x

 

 

e2

; then by the

x

 

k

ik

 

Cauchy-Schwarz Inequality (B.20) and Assumption 6.4.1,

 

 

i

 

 

i0 i

 

 

 

i

 

E

xixi0ei2 = E kxik2 ei2 E kxik4 E ei4

1=2

< 1:

 

 

 

 

 

 

 

 

 

 

 

Since this expectation

is …nite, we can apply the WLLN (Theorem 2.7.2) to …nd that

 

 

1

 

n

 

 

 

 

p

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

x0e2

 

 

 

 

 

x0e2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

x

! E

x

 

= :

 

 

 

 

 

 

 

 

 

 

 

 

 

n

=1

i

i i

 

i

i i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now take the second term on the right-hand-side of (6.38). By the Triangle Inequality (A.9), the fact that Ekxik2 < 1 and Theorem 6.6.2,

 

1

n

 

 

 

 

 

 

 

 

 

 

1

n

 

 

 

 

 

 

 

 

 

 

xixi0

 

e^i2 ei2

 

 

 

 

 

 

xixi0

 

e^i2 ei2

 

 

 

n

 

 

 

n

i=1

 

 

 

 

 

 

 

i=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

kxik2 e^i2 ei2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

n

kxik2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1maxi n

e^i2 ei2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

= Op(1)op(1)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

op(1)

 

 

 

 

 

 

 

 

 

Together, we have established that

 

p

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

! as claimed.

 

 

 

 

 

 

 

 

 

 

 

 

 

invertibilility of Q ;

 

 

 

 

 

 

 

 

 

Combined with (6.1) and the

 

 

b

 

 

 

 

 

 

 

xx

 

 

 

 

 

 

 

 

 

 

 

V

 

= Q 1

Q 1

p

 

Q 1

Q 1 = V ;

 

 

 

 

 

 

b b

 

 

b

xx

b b

!

 

xx

 

xx

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]