Econometrics2011
.pdfCHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
123 |
Theorem 6.10.1 Under Assumption 1.5.1 and Assumption 6.4.1, if h( ) is continuously di¤erentiable at the true value of ; then as n ! 1;
V |
p |
V |
|
: |
b |
! |
|
|
6.11t statistic
Let = h( ) : Rk ! R be any parameter of interest (for example, could be a single element
its estimate and s( ) its asymptotic standard error. Consider the statistic |
|
of ), b |
b |
b
tn( ) = b : (6.31) s( )
Di¤erent writers have called (6.31) a t-statistic, a t-ratio, a z-statistic or a studentized statistic. We won’t be making such distinctions and will refer to tn( ) as a t-statistic or a t-ratio. We also often suppress the parameter dependence, writing it as tn: The t-statistic is a simple function of the estimate, its standard error, and the parameter.
d
Theorem 6.11.1 tn( ) ! N (0; 1)
Thus the asymptotic distribution of the t-ratio tn( ) is the standard normal. Since this distribution does not depend on the parameters, we say that tn( ) is asymptotically pivotal. In special cases (such as the normal regression model, see Section 4.14), the statistic tn has an exact t distribution, and is therefore exactly free of unknowns. In this case, we say that tn is exactly pivotal. In general, however, pivotal statistics are unavailable and we must rely on asymptotically pivotal statistics.
6.12Con…dence Intervals
A con…dence interval Cn is an interval estimate of 2 R: It is a function of the data and hence is random. It is designed to cover with high probability. Either 2 Cn or 2= Cn: Its coverage probability is Pr( 2 Cn). The convention is to design con…dence intervals to have coverage probability approximately equal to a pre-speci…ed target, typically 90% or 95%, or more generally written as (1 )% for some 2 (0; 1): By reporting a (1 )% con…dence interval Cn; we are stating that the true lies in Cn with (1 )% probability across repeated samples.
There is not a unique method to construct con…dence intervals. For example, a simple (yet
silly) interval is |
|
|
with probability 1 |
|
|
||
C = |
R |
|
|
||||
n |
|
with probability |
|
|
|||
|
|
distribution, Pr( |
2 |
C ) = 1 |
|
; so this con…dence interval |
|
By construction, if has a continuous b |
|
n |
|
has perfect coverage, but Cn is uninformative about : This is not a useful con…dence interval. |
||||
When we have anb asymptotically normal parameter estimate with standard error s( ); it turns |
||||
out that a generally reasonable con…dence interval for takes |
the form |
b |
||
b |
||||
b |
b |
b |
b |
(6.32) |
Cn = h c s( ); |
+ c s( )i |
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
124 |
where c > 0 is a pre-speci…ed constant. This con…dence interval is symmetric about the point
b b estimate ; and its length is proportional to the standard error s( ):
Equivalently, Cn is the set of parameter values for such that the t-statistic tn( ) is smaller (in absolute value) than c; that is
|
|
|
|
|
|
|
|
|
b |
|
|
C = |
f |
: t |
( ) |
j |
c |
g |
= : c |
|
|
c : |
|
n |
j n |
|
|
( |
|
s( ) |
) |
||||
|
|
|
|
|
|
|
|
|
b |
|
|
The coverage probability of this con…dence interval is
Pr ( 2 Cn) = Pr (jtn( )j c)
which is generally unknown, but we can approximate the coverage probability by taking the asymptotic limit as n ! 1: Since tn( ) is asymptotically standard normal (Theorem 6.11.1), it follows that as n ! 1 that
Pr ( 2 Cn) ! Pr (jZj c) = (c) ( c)
where Z N (0; 1) and (u) = Pr (Z u) is the standard normal distribution function. We call this the asymptotic coverage probability, and it is a function only of c:
As we mentioned before, the convention is to design the con…dence interval to have a prespeci…ed asymptotic coverage probability 1 ; typically 90% or 95%. This means selecting the constant c so that
(c) ( c) = 1 :
E¤ectively, this makes c a function of ; and can be backed out of a normal distribution table. For example, = 0:05 (a 95% interval) implies c = 1:96 and = 0:1 (a 90% interval) implies c = 1:645: Rounding 1.96 to 2, we obtain the most commonly used con…dence interval in applied econometric
practice h i
b b b b
Cn = 2s( ); + 2s( ) :
This is a useful rule-of thumb. This asymptotic 95% con…dence interval Cn is simple to compute and can be roughly calculated from tables of coe¢ cient estimates and standard errors. (Technically, it is an asymptotic 95.4% interval, due to the substitution of 2.0 for 1.96, but this distinction is meaningless.)
Con…dence intervals are a simple yet e¤ective tool to assess estimation uncertainty. When reading a set of empirical results, look at the estimated coe¢ cient estimates and the standard errors. For a parameter of interest, compute the con…dence interval Cn and consider the meaning of the spread of the suggested values. If the range of values in the con…dence interval are too wide to learn about ; then do not jump to a conclusion about based on the point estimate alone.
6.13Regression Intervals
In the linear regression model the conditional mean of yi given xi = x is
m(x) = E(yi j xi = x) = x0 :
In some cases, we want to estimate m(x) at a particular point x: Notice that this is a (linear)
function of : Letting h( ) = x0 and = h( ); we see that m(x) = = x0 |
and H |
|
= x; so |
|||||
q |
|
|
|
|
b |
b |
|
|
b |
b |
|
|
|
|
|||
s( ) = |
n 1x0V x: Thus an asymptotic 95% con…dence interval for m(bx) is |
|
||||||
|
|
x0 |
b |
|
|
|
|
|
|
|
b |
|
|
|
|||
|
|
2qn 1x0V x : |
|
|
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
125 |
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.7: Wage on Education Regression Intervals
It is interesting to observe that if this is viewed as a function of x; the width of the con…dence set is dependent on x:
To illustrate, we return to the log wage regression (4.9) of Section 4.4. The estimated regression equation is
\ 0b
log(W age) = x = 0:626 + 0:156x:
where x = Education. The White covariance matrix estimate is
V = |
7:092 |
0:445 |
b b |
0:445 |
0:029 |
and the sample size is n = 61: Thus the 95% con…dence interval for the regression takes the form r
0:626 + 0:156x 2 |
1 |
(7:092 0:89x + 0:029x2) : |
61 |
The estimated regression and 95% intervals are shown in Figure 6.7. Notice that the con…dence bands take a hyperbolic shape. This means that the regression line is less precisely estimated for very large and very small values of education.
Plots of the estimated regression line and con…dence intervals are especially useful when the regression includes nonlinear terms. To illustrate, consider the log wage regression (4.10) which includes experience and its square.
\ |
2 |
(6.33) |
log(W age) = 1:06 + 0:116 education + 0:010 experience 0:014 experience =100 |
and has n = 2454 observations. We are interested in plotting the regression estimate and regression intervals as a function of experience. Since the regression also includes education, in order to plot the estimates in a simple graph we need to …x education at a speci…c value. We select education=12. This only a¤ects the level of the estimated regression, since education enters without an interaction.
De…ne the points of evaluation |
|
|
|
|
B |
1 |
C |
|
2 x |
||
x = |
0 |
12 |
1 |
|
@ |
|
A |
|
B x =100 |
C |
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
127 |
The asymptotic distribution of Wn( ) is simple to derive given Theorem 6.9.2 and Theorem 6.10.1, which show that
|
|
|
|
|
|
d |
|
|
|
|
|
|||
pn |
|
|
|
|
|
|
||||||||
! Z N (0; V ) |
|
|
|
|||||||||||
and |
b |
|
|
p |
|
|
|
|
|
|||||
|
|
|
|
V |
! V : |
|
|
|
|
|||||
It follows that |
|
|
b |
0 |
|
1 |
|
|
d |
1 |
|
|
||
Wn( ) = pn |
V |
pn |
|
Z |
(6.35) |
|||||||||
|
! Z0V |
|
||||||||||||
|
|
|
|
b |
|
b |
|
|
b |
|
|
|
|
a quadratic in the normal random vector Z: Here we can appeal to a useful result from probability theory. (See Theorem B.9.3 in the Appendix.)
Theorem 6.14.1 If Z N (0; A) with A > 0; q q; then Z0A 1Z 2q; a chi-square random variable with q degrees of freedom.
The asymptotic distribution in (6.35) takes exactly this form. It follows that Wn( ) converges in distribution to a chi-square random variable.
Theorem 6.14.2 Under Assumption 1.5.1 and Assumption 6.4.1, if h( ) is continuously di¤erentiable at the true value of ; then as n ! 1;
d 2
Wn( ) ! q:
6.15Con…dence Regions
A con…dence region Cn is a generalization of a con…dence interval to the case 2 Rq with q > 1: A con…dence region Cn is a set in Rq intended to cover the true parameter value with a pre-selected probability 1 : Thus an ideal con…dence region has the coverage probability Pr( 2 Cn) = 1 . In practice it is typically not possible to construct a region with exact coverage, but we can calculate its asymptotic coverage.
When the parameter estimate satis…es the conditions of Theorem 6.14.2, a good choice for a con…dence region is the ellipse
Cn = f : Wn( ) c1 g :
with c1 the 1 ’th quantile of the 2q distribution. (Thus Fq(c1 ) = 1 :) These quantiles can be found from a critical value table for the 2q distribution.
Theorem 6.14.2 implies
Pr ( 2 Cn) ! Pr 2q c1 = 1
which shows that Cn has asymptotic coverage (1 )%:
To illustrate the construction of a con…dence region, consider the estimated regression (6.33) of
the model |
|
\ |
2 |
log(W age) = + 1 |
education + 2 experience + 3 experience =100: |
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
128 |
Suppose that the two parameters of interest are the percentage return to education 1 = 100 1 and the percentage return to experience for individuals with 10 years experience 2 = 100 2 + 20 3. (We need to condition on the level of experience since the regression is quadratic in experience.) These two parameters are a linear transformation of the regression parameters with point estimates
|
|
b |
|
|
0 |
100 |
0 |
0 |
|
b |
= |
11:6 |
; |
|
|
|
|
|
||
|
|
|
= |
0 |
0 |
100 |
20 |
|
0:72 |
|
|
|
|
|
||||||
and have the covariance matrix estimate |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
0 |
0 |
0 |
1 |
|
|
|
|
|
|
V |
= |
0 |
100 |
0 |
0 |
V |
100 |
0 |
|
|
|
|
|||||||
|
b |
|
|
|
|
|
|
|
b b B |
0 |
20 |
C |
|
|
|
|
||||
|
|
|
|
0 |
0 |
100 |
20 |
|
@ |
0 |
100 |
A |
|
|
|
|
||||
|
|
|
|
|
B |
C |
|
|
|
|
||||||||||
|
|
|
|
|
645:4 67:387 |
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
= |
67:387 |
165 |
|
|
|
|
|
|
|
|
|
|
|
|||||
with inverse |
|
|
1 |
|
0:0016184 |
|
0:00066098 |
: |
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
||||||||||||
|
|
V |
|
= |
|
0:00066098 |
0:0063306 |
|
|
|
|
|
||||||||
Thus the Wald statistic is |
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wn( ) = n |
0 V 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
b |
11:6b |
1 |
b 0 |
0:0016184 |
|
0:00066098 |
|
11:6 |
|
1 |
|
||||||||
= 2454 |
|
|
|
|
|
0:00066098 |
|
|
|
|
|
|
|
|||||||
|
|
0:72 2 |
0:0063306 |
0:72 2 |
= 3:97 (11:6 1)2 3:2441 (11:6 1) (0:72 2) + 15:535 (0:72 2)2
The 90% quantile of the 22 distribution is 4.605 (we use the 22 distribution as the dimension of is two), so an asymptotic 90% con…dence region for the two parameters is the interior of the ellipse
3:97 (11:6 1)2 3:2441 (11:6 1) (0:72 2) + 15:535 (0:72 2)2 = 4:605
which is displayed in Figure 6.9. Since the estimated correlation of the two coe¢ cient estimates is small (about 0.2) the ellipse is close to circular.
6.16Semiparametric E¢ ciency in the Projection Model
In Section 5.4 we presented the Gauss-Markov theorem, which stated that in the homoskedastic CEF model, in the class of linear unbiased estimators the one with the smallest variance is leastsquares. As we noted in that section, the restriction to linear unbiased estimators is unsatisfactory as it leaves open the possibility that an alternative (non-linear) estimator could have a smaller asymptotic variance. In addition, the restriction to the homoskedastic CEF model is also unsatisfactory as the projection model is more relevant for empirical application. The question remains: what is the most e¢ cient estimator of the projection coe¢ cient (or functions = h( )) in the projection model?
It turns out that it is straightforward to show that the projection model falls in the estimator class considered in Proposition 2.13.2. It follows that the least-squares estimator is semiparametrically e¢ cient in the sense that it has the smallest asymptotic variance in the class of semiparametric estimators of . This is a more powerful and interesting result than the Gauss-Markov theorem.
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
129 |
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.9: Con…dence Region for Return to Experience and Return to Education
To see this, it is worth rephrasing Proposition 2.13.2 with amended notation. Suppose that a pa-
rameter of interest is = g( n) where = Ezi; for which the moment estimators are b = 1 Pno zi
n i=1
|
= g( ): Let |
L2 |
(g) = F : |
Ek |
z |
k |
2 < |
1 |
; g (u) is continuously di¤erentiable at u = z be |
and b |
b |
|
|
|
E |
b
the set of distributions for which satis…es the central limit theorem.
b
Proposition 6.16.1 In the class of distributions F 2 L2(g); is semiparametrically e¢ cient for in the sense that its asymptotic variance equals the semiparametric e¢ ciency bound.
b
Proposition 6.16.1 says that under the minimal conditions in which is asymptotically normal,
b then no semiparametric estimator can have a smaller asymptotic variance than .
To show that an estimator is semiparametrically e¢ cient it is su¢ cient to show that it falls
in the class covered by this Proposition. To show that the projection model falls in this class, we write = Qxx1Qxy = g ( ) where = Ezi and zi = (xix0i; xiyi) : The class L2(g) equals the class of distributions n o
L4( ) = F : Ey4 < 1; Ekxk4 < 1; Exix0i > 0 :
Proposition 6.16.2 In the class of distributions F 2 L4( ); the least-
b
squares estimator is semiparametrically e¢ cient for .
The least-squares estimator is an asymptotically e¢ cient estimator of the projection coe¢ cient because the latter is a smooth function of sample moments and the model implies no further
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
130 |
restrictions. However, if the class of permissible distributions is restricted to a strict subset of L4( ) then least-squares can be ine¢ cient. For example, the linear CEF model with heteroskedastic errors is a strict subset of L4( ); and the GLS estimator has a smaller asymptotic variance than OLS. In this case, the knowledge that true conditional mean is linear allows for more e¢ cient estimation of the unknown parameter.
b b
From Proposition 6.16.1 we can also deduce that plug-in estimators = h( ) are semiparametrically e¢ cient estimators of = h( ) when h is continuously di¤erentiable. We can also deduce that other parameters estimators are semiparametrically e¢ cient, such as ^2 for 2: To see this, note that we can write
2 = E yi xi0 2 |
|
|
|
|
|
|||||||||
= |
2 |
2E |
yixi0 + 0E xixi0 |
|||||||||||
Eyi |
||||||||||||||
= |
Qyy |
|
Q xQ 1Q |
xy |
|
|
|
|||||||
|
|
|
|
y |
|
|
xx |
|
||||||
which is a smooth function of the moments Q |
|
; Q |
yx |
and Q |
xx |
: Similarly the estimator ^2 equals |
||||||||
|
|
|
|
|
|
yy |
|
|
|
|
||||
|
^2 = |
1 |
|
|
n |
e^i2 |
|
|
|
|
||||
|
|
|
|
Xi |
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
n =1 |
|
|
|
|
|
|
||||
|
= |
Q |
|
|
|
Q Q 1Q |
|
|
||||||
|
|
|
|
byy byx bxx bxy |
|
Since the variables yi2; yix0i and xix0i all have …nite variances when F 2 L4( ); the conditions of Proposition 6.16.1 are satis…ed. We conclude:
Proposition 6.16.3 In the class of distributions F 2 L4( ); ^2 is semiparametrically e¢ cient for 2.
6.17Semiparametric E¢ ciency in the Homoskedastic Regression Model*
In Section 6.16 we showed that the OLS estimator is semiparametrically e¢ cient in the projection model. What if we restrict attention to the classical homoskedastic regression model? Is OLS still e¢ cient in this class? In this section we derive the asymptotic semiparametric e¢ ciency bound for this model, and show that it is the same as that obtained by the OLS estimator. Therefore it turns out that least-squares is e¢ cient in this class as well.
Recall that in the homoskedastic regression model the asymptotic variance of the OLS estimator
b |
= Qxx1 |
|
|
|
for is V 0 |
2 |
: Therefore, as described in Section 2.13, it is su¢ cient to …nd a parametric |
submodel whose Cramer-Rao bound for estimation of is V 0 : This would establish that V 0 is the semiparametric variance bound and the OLS estimator is semiparametrically e¢ cient for :
Let the joint density of y and x be written as f (y; x) = f |
(y |
x) f (x) ; the product of the |
||||||||||||||||||
conditional density of y given x and the marginal densitybof |
1x. jNow2consider the parametric |
|||||||||||||||||||
submodel |
|
|
|
|
|
|
|
1 + y x0 |
|
|
|
|
|
|
|
|
|
|
||
|
f (y; x j ) = f1 (y j x) |
x0 = 2 |
|
f2 (x) : |
|
(6.36) |
||||||||||||||
You can check that in this submodel the |
marginal density of x is f |
|
(x) and the conditional density |
|||||||||||||||||
|
|
|
|
|
|
2 |
|
|
|
|
|
|||||||||
of y given x is f1 (y j x) |
1 + (y x0 ) (x0 ) = 2 |
|
: To see that the latter is a valid conditional |
|||||||||||||||||
density, observe that the |
regression assumption implies that |
yf |
|
|
(y |
|
x) dy = x |
and therefore |
|
|||||||||||
|
|
|
|
|
|
|
|
|
R |
|
1 |
|
j |
0 |
|
|
|
|||
|
|
|
|
|
2 |
|
= 1: |
|
|
|
|
|
|
|
|
|
2 |
|||
|
|
|
(y j x) dy + Z f1 |
|
|
|||||||||||||||
Z f1 (y j x) 1 + y x0 x0 |
= dy = |
Z f1 |
(y j x) y x0 dy x0 |
= |
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
131 |
|||||
In this parametric submodel the conditional mean of y given x is |
|
|||||
E (y j x) = |
Z |
yf1 (y j x) 1 + y x0 x0 = 2 dy |
|
|||
= |
Z |
yf1 (y j x) dy + Z |
yf1 (y j x) y x0 x0 = 2dy |
|
||
= |
Z |
yf1 (y j x) dy + Z y x0 2 f1 (y j x) x0 = 2dy |
|
|||
|
+ Z y x0 f1 (y j x) dy x0 x0 = 2 |
|
||||
= |
x0 ( + ) ; |
|
|
|
||
using the homoskedasticity assumption |
(y x0 )2 f1 (y j x) dy = 2: This means that in this |
|||||
parametric submodel, the conditional |
mean is linear in x and the regression coe¢ cient is ( ) = |
|||||
|
R |
|
|
+ :
We now calculate the score for estimation of : Since
@@ log f (y; x j ) = @@ log 1 + y x0 x0 = 2 =
the score is
s = @@ log f (y; x j 0) = xe= 2:
The Cramer-Rao bound for estimation of (and therefore ( ) as well) is
E ss0 1 = 4E (xe) (xe)0 1 = 2Qxx1 = V 0 :
We have shown that there is a parametric submodel (6.36) whose Cramer-Rao bound for estimation of is identical to the asymptotic variance of the least-squares estimator, which therefore is the semiparametric variance bound.
Theorem 6.17.1 In the homoskedastic regression model, the semiparametric variance bound for estimation of is V 0 = 2Qxx1 and the OLS estimator is semiparametrically e¢ cient.
This result is similar to the Gauss-Markov theorem, in that it asserts the e¢ ciency of the leastsquares estimator in the context of the homoskedastic regression model. The di¤erence is that the Gauss-Markov theorem states that OLS has the smallest variance among the set of unbiased linear estimators, while Theorem 6.17.1 states that OLS has the smallest asymptotic variance among all regular estimators. This is a much more powerful statement.
6.18 Technical Proofs*
Proof of Theorem 6.3.1. Note that
e^i |
= |
yi xi0 |
|
|
i0 |
|
|
|
|
|||||
|
= |
ei |
+ xi0 |
b |
|
|
|
|
|
|||||
|
= |
e |
i |
|
x |
i0 |
|
|
|
|
b |
: |
|
|
|
|
|
|
b |
|
|
|
|||||||
Thus |
|
|
|
|
|
|
|
|
xixi0 |
|
(6.37) |
|||
e^i2 = ei2 2eixi0 |
+ 0 |
|||||||||||||
|
b |
|
|
|
|
|
b |
|
|
|
|
b |
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
132 |
|||||||||||
and |
|
|
|
|
|
|
|
|
|
|
|
|
^2 = |
1 |
|
n |
e^i2 |
|
|
|
|
|
|
||
|
Xi |
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
||
|
n =1 |
|
1 n |
! |
|
|
1 n |
! |
||||
1 |
|
n |
|
|
0 |
|||||||
|
|
X |
|
|
X |
|
|
X |
||||
= |
n |
i=1 ei2 2 |
n |
i=1 eixi0 |
+ |
n |
i=1 xixi0 |
|||||
p |
|
2 |
|
|
|
b |
|
b |
|
|
b |
|
! |
|
|
|
|
|
|
|
|
|
|
as n ! 1; the last line using the WLLN and Theorem 6.2.1. Thus ^2 is consistent for 2: Finally, since n=(n k) ! 1 as n ! 1; it follows that as n ! 1;
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
p |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
s2 = |
|
|
|
|
|
^2 |
! 2: |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
n |
|
k |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
…rst show |
p |
|
|
: Note that |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
Proof of Theorem 6.8.2. We |
|
|
1 |
|
|
n |
b |
2! |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
= |
|
|
|
|
|
xixi0e^i |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
b |
|
|
n |
|
=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
1 |
|
|
n |
|
|
|
|
|
1 |
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
X |
|
|
ei2 |
|
|
|
|
|
|
|
|
(6.38) |
|||||||
|
= |
n |
|
=1 |
xixi0ei2 + n |
|
|
xixi0 |
e^i2 |
|
: |
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
i=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
We now examine each k k sum on the right-hand-side of (6.38) in turn. |
|
|
|
|
2 |
|
|
|||||||||||||||||||||||||
Take the …rst term on the right-hand-side of (6.38). Since |
|
x e2 |
|
= |
|
x |
|
|
e2 |
; then by the |
||||||||||||||||||||||
x |
|
k |
ik |
|
||||||||||||||||||||||||||||
Cauchy-Schwarz Inequality (B.20) and Assumption 6.4.1, |
|
|
i |
|
|
i0 i |
|
|
|
i |
|
|||||||||||||||||||||
E |
xixi0ei2 = E kxik2 ei2 E kxik4 E ei4 |
1=2 |
< 1: |
|
|
|
|
|
||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||||
Since this expectation |
is …nite, we can apply the WLLN (Theorem 2.7.2) to …nd that |
|||||||||||||||||||||||||||||||
|
|
1 |
|
n |
|
|
|
|
p |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
Xi |
|
x0e2 |
|
|
|
|
|
x0e2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
x |
! E |
x |
|
= : |
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
n |
=1 |
i |
i i |
|
i |
i i |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now take the second term on the right-hand-side of (6.38). By the Triangle Inequality (A.9), the fact that Ekxik2 < 1 and Theorem 6.6.2,
|
1 |
n |
|
|
|
|
|
|
|
|
|
|
1 |
n |
|
|
|
|
|
|
|
|
|
|
xixi0 |
|
e^i2 ei2 |
|
|
|
|
|
|
xixi0 |
|
e^i2 ei2 |
|
|
|
||||||||
n |
|
|
|
n |
i=1 |
|
|
|
|
|
|||||||||||||
|
|
i=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
X |
|
|
|
|
|
|
|
X |
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
1 |
n |
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
= |
|
|
|
kxik2 e^i2 ei2 |
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
n |
=1 |
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
n |
kxik2 |
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
1maxi n |
e^i2 ei2 |
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
n |
=1 |
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
= Op(1)op(1) |
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
= |
op(1) |
|
|
|
|
|
|
|
|
|
||
Together, we have established that |
|
p |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
! as claimed. |
|
|
|
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
invertibilility of Q ; |
|
|
|
|
|
|
|
|
|
|||||||||
Combined with (6.1) and the |
|
|
b |
|
|
|
|
|
|
|
xx |
|
|
|
|
|
|
|
|
|
|||
|
|
V |
|
= Q 1 |
Q 1 |
p |
|
Q 1 |
Q 1 = V ; |
|
|
|
|
||||||||||
|
|
b b |
|
|
b |
xx |
b b |
! |
|
xx |
|
xx |
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|