Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 2. MOMENT ESTIMATION

 

 

 

 

 

13

elements of y are

0 y2

1

 

 

 

 

 

 

 

 

 

 

y1

 

 

 

 

 

 

B y

m

C

:

 

 

 

 

B

C

 

 

 

 

y = B

...

C

 

 

 

 

@

 

A

 

 

 

 

The population mean of y is just the vector of marginal means

 

0

E(y1)

1

 

 

E(y2)

 

 

B

E

(y

m

)

C

 

 

B

...

 

C

:

 

= E(y) = B

 

 

 

C

 

@

 

 

 

 

A

 

When working with random vectors y it is convenient to measure their magnitude with the

Euclidean norm

kyk = y12 + + ym2 1=2 :

This is the classic Euclidean length of the vector y. Notice that kyk2 = y0y:

It turns out that it is equivalent to describe …niteness of moments in terms of the Euclidean norm of a vector or all individual components.

Theorem 2.7.1 For y 2 Rm; Ekyk < 1 if and only if Ejyjj < 1 for j = 1; :::; m:

Theorem 2.7.1 implies that the components of are …nite if and only if Ekyk < 1. The m m variance matrix of y is

V = var (y) = E (y ) (y )0

 

:

 

V is often called a variance-covariance matrix. You can show

that the elements of V

are …nite if

 

 

 

Ekyk2 < 1:

A random sample fy1; :::; yng consists of n observations of independent and identically draws from the distribution of y: (Each draw is an m-vector.) The vector sample mean

 

 

 

 

0

 

 

 

1

1

 

 

 

 

 

y

 

 

1 n

 

 

 

2

 

 

y

y = n i=1 yi = B ...

C

 

 

 

X

B

 

 

 

 

C

 

 

 

 

 

 

 

 

 

 

 

y

m

 

 

 

 

B

C

 

 

 

 

@

 

 

 

 

A

is the vector of means of the individual variables.

Convergence in probability of a vector is de…ned as convergence in probability of all elements

 

 

 

p

p

in the vector. Thus

y

! if and only if

y

j ! j for j = 1; :::; m: Since the latter holds if

Ejyjj < 1 for j = 1; :::; m; or equivalently Ekyk < 1; we can state this formally as follows.

 

 

 

 

 

 

 

 

Theorem 2.7.2 Weak Law of Large Numbers (WLLN) for random vectors

 

 

If Ekyk < 1 then as n ! 1,

 

 

 

 

 

 

 

 

1 n

p

 

 

 

 

 

 

 

 

Xi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

y = n =1 yi ! E(yi):

 

CHAPTER 2. MOMENT ESTIMATION

14

2.8Convergence in Distribution

The WLLN is a useful …rst step, but does not give an approximation to the distribution of an estimator. A large-sample or asymptotic approximation can be obtained using the concept of convergence in distribution.

De…nition 2.8.1 Let zn be a random vector with distribution Fn(u) = Pr (zn u) : We

d

say that zn converges in distribution to z as n ! 1, denoted zn ! z; if for all u at which F (u) = Pr (z u) is continuous, Fn(u) ! F (u) as n ! 1:

d

When zn ! z, it is common to refer to z as the asymptotic distribution or limit distribution of zn.

When the limit distribution z is degenerate (that is, Pr (z = c) = 1 for some c) we can write

d p

the convergence as zn ! c, which is equivalent to convergence in probability, zn ! c.

The typical path to establishing convergence in distribution is through the central limit theorem (CLT), which states that a standardized sample average converges in distribution to a normal random vector.

Theorem 2.8.1 Central Limit Theorem (CLT). If Ekyk2 < 1 then as n ! 1

 

 

 

 

 

 

 

n

 

 

p

 

 

 

1

Xi

 

d

 

 

 

 

 

 

n (

y

n ) =

p

 

 

(yi

) ! N (0; V )

 

n

=1

where = Ey and V = E (y ) (y )0

:

p

The standardized sum zn = n (yn ) has mean zero and variance V . What the CLT adds is that the variable zn is also approximately normally distributed, and that the normal approximation improves as n increases.

The CLT is one of the most powerful and mysterious results in statistical theory. It shows that the simple process of averaging induces normality. The …rst version of the CLT (for the number of heads resulting from many tosses of a fair coin) was established by the French mathematician Abraham de Moivre in 1733. This was extended to cover an approximation to the binomial distribution in 1812 by Pierre-Simon Laplace, and the general statement is credited to the Russian mathematician Aleksandr Lyapunov in 1901.

2.9Functions of Moments

We now expand our investigation and consider estimation of parameters which can be written as a continuous function of . That is, the parameter of interest is the vector of functions

= g ( )

(2.5)

where g : Rm ! Rk: As one example, the geometric mean of wages w is

= exp (E(log (w)))

(2.6)

CHAPTER 2. MOMENT ESTIMATION

15

which is (2.5) with

g(u) = exp (u)

and = E(log (w)) : As another example, the skewness of the wage distribution is

sk

=

 

 

 

E(w Ew)3

3=2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E(w E2w)2 3

 

 

 

where w = wage and

= g Ew; Ew ; Ew

 

 

 

 

 

 

 

 

3 3 2 1 + 2 13

 

 

g ( ; ; ) =

:

(2.7)

2 12 3=2

1

2

3

 

 

 

 

 

In this case we can set

 

y =

 

 

 

 

 

0 w2

1

 

 

 

 

 

 

 

 

 

 

 

 

w

A

 

 

 

 

 

 

so that

 

 

 

 

@ w3

 

 

 

 

 

 

 

 

 

 

 

Ew

 

 

 

 

 

 

 

 

 

 

 

@

A

 

 

 

 

 

(2.8)

 

 

 

 

Ew3

 

 

 

 

 

 

=

0

Ew2

1

:

 

 

 

 

The parameter = g ( ) is not a population moment, so it does not have a direct moment estimator. Instead, it is common to use a plug-in estimate formed by replacing the unknown with its point estimate b so that

b b

= g ( ) :

b

Again, the hat “^”indicates that is a sample estimate of : For example, the plug-in estimate of the geometric mean

is

b = exp(b)

(2.9)

of the wage distribution from (2.6)

with

1 Xn

b = n i=1 log (wagei) :

The plug-in estimate of the skewness of the wage distribution is

 

 

1

 

P

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

n

 

(wi w)

3

 

sk =

 

 

 

n

 

i=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

c

 

 

 

 

n

 

 

 

 

 

3

 

2

 

3=2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(wi w)

 

 

 

b

n

 

 

i=1

 

 

 

 

 

 

 

 

 

b b

 

 

b

 

 

 

 

 

 

=

 

3

 

P3

1

+ 2

1

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

b

 

b

2

 

3=2

 

 

 

 

 

 

 

 

 

 

 

2 1

 

 

 

 

 

 

 

 

 

where

bj = n1 Xn wij: i=1

A useful property is that continuous functions are limit-preserving.

p

Theorem 2.9.1 Continuous Mapping Theorem (CMT). If zn ! c

p

as n ! 1 and g ( ) is continuous at c; then g(zn) ! g(c) as n ! 1.

CHAPTER 2. MOMENT ESTIMATION

16

The proof of Theorem 2.9.1 is given in Section 2.15.

p

For example, if zn ! c as n ! 1 then

p

zn + a ! c + a

p

azn ! ac

zn2

p

! c2

as the functions g (u) = u + a; g (u) = au; and g (u) = u2 are continuous. Also

a

p a

 

!

 

 

zn

c

if c =6 0: The condition c =6 0 is important as the function g(u) = a=u is not continuous at u = 0:

b

We need the following assumption in order for to be consistent for .

Theorem 2.9.2 If Ekyk < 1 and g (u) is continuous at u = then

 

 

b

p

 

 

 

 

 

= g ( ) ! g ( ) =

as n

! 1

b

 

 

; and thus isbconsistent for :

To apply Theorem 2.9.2 it is necessary to check if the function g is continuous at . In our …rst example g(u) = exp (u) is continuous everywhere. It therefore follows from Theorem 2.7.2 and Theorem 2.9.2 that if Ejlog (wage)j < 1 then as n ! 1

 

 

p

 

 

 

 

 

 

 

! :

 

 

 

 

 

 

In our second example g de…ned in (2.7) isbcontinuous for all

such that var(w) =

2

 

2

> 0;

3

 

1

 

which holds unless w has a degenerate distribution. Thus if Ejwj

 

< 1 and var(w) > 0 then as

n ! 1

sk

p sk:

 

 

 

 

 

 

 

c

!

 

 

 

 

 

 

2.10Delta Method

In this section we introduce two tools –an extended version of the CMT and the Delta Method

b

–which allow us to calculate the asymptotic distribution of the parameter estimate .

We …rst present an extended version of the continuous mapping theorem which allows convergence in distribution.

Theorem 2.10.1 Continuous Mapping Theorem

d m k

If zn ! z as n ! 1 and g : R ! R has the set of discontinuity points

d

Dg such that Pr (z 2 Dg) = 0; then g(zn) ! g(z) as n ! 1.

For a proof of Theorem 2.10.1 see Theorem 2.3 of van der Vaart (1998). It was …rst proved by Mann and Wald (1943) and is therefore sometimes referred to as the Mann-Wald Theorem

Theorem 2.10.1 allows the function g to be discontinuous only if the probability at being at a discontinuity point is zero. For example, the function g(u) = u 1 is discontinuous at u = 0; but if

d

d

zn ! z N (0; 1) then Pr (z = 0) = 0 so zn 1

! z 1:

A special case of the Continuous Mapping Theorem is known as Slutsky’ Theorem.

CHAPTER 2. MOMENT ESTIMATION

17

 

 

 

 

 

 

 

Theorem 2.10.2 Slutsky’s Theorem

 

 

 

 

d

 

 

p

 

 

If zn ! z and cn ! c as n ! 1 then

 

 

1.

 

 

 

 

d

 

 

zn + cn ! z + c

 

 

2.

 

 

d

 

 

zncn ! zc

 

 

 

 

zn

d z

 

 

3.

 

 

!

 

if c 6= 0

 

 

 

cn

c

 

 

 

 

 

 

 

 

 

Even though Slutsky’ Theorem is a special case of the CMT, it is a useful statement as it focuses on the most common applications –addition, multiplication and division.

Despite the fact that the plug-in estimator is a function of for which we have an asymptotic

distribution, Theorem 2.10.1 does not directlyb give us an

asymptotic distribution for : This is

 

standardized sequence pn (

) :

because =

 

b

 

 

 

 

 

 

 

 

b

 

 

b

 

 

 

 

 

b

 

g ( ) is written as a function of

, not of the

 

 

 

 

b

We needban intermediate step – a …rst order Taylor series expansion. This step is so critical to

statistical theory that it has its own name –The Delta Method.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Theorem 2.10.3 Delta Method:

 

 

 

 

 

 

 

 

 

 

 

 

If

p

 

 

 

d

 

 

g( ) : R

m

k

; k m; is

 

 

 

 

 

 

 

 

 

 

 

 

 

n ( n 0) ! ; where is m 1; and

 

! R

 

 

 

continuously di¤erentiable in a neighborhood of then as n ! 1

 

 

 

 

 

 

 

 

 

 

p

 

 

 

 

 

d

 

 

 

 

(2.10)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a n (g ( n) g( 0)) ! G0

 

 

 

 

 

 

 

where G( ) =

@

g( )0 and G = G( 0): In particular, if

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

p

 

 

d

(0; V )

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n ( n 0) ! N

 

 

 

 

 

 

 

 

 

where V is m m; then as n ! 1

 

0; G0V G :

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

d

 

(2.11)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pn (g ( n) g( 0)) ! N

 

 

 

 

 

The Delta Method allows us to complete our derivation of the asymptotic distribution of the

b

estimator of . Relative to consistency, it requires the stronger smoothness condition that g( ) is continuously di¤erentiable.

Now by combining Theorems 2.8.1 and 2.10.3 we can …nd the asymptotic distribution of the

b plug-in estimator .

Theorem 2.10.4 If Ekyk2 < 1 and G (u) =

@

g (u)0 is continuous in

 

@u

a neighborhood of u = then as n ! 1

 

 

 

 

 

 

d

 

 

 

pn

0; G0V G

! N

where G = G ( ) :

b

 

 

 

 

 

 

 

 

 

 

 

CHAPTER 2. MOMENT ESTIMATION

18

2.11Stochastic Order Symbols

It is convenient to have simple symbols for random variables and vectors which converge in probability to zero or are stochastically bounded. The notation zn = op(1) (pronounced “small

p

oh-P-one”) means that zn ! 0 as n ! 1: We also say that zn = op(an)

if an is a sequence such that a 1zn = op(1): For example, for any consistent estimator for we

then can write

 

n

 

= + op(1)

 

b

Similarly, the notation z

n

=

Op(1)

(pronounced “big on-P-one”) means that z

n

is bounded in

 

 

b

 

probability. Precisely, for any " > 0 there is a constant M" < 1 such that

lim Pr (jznj > M") ":

n!1

We say that

zn = Op(an)

if an is a sequence such that an 1zn = Op(1):

Op(1) is weaker than op(1) in the sense that zn = op(1) implies zn = Op(1) but not the reverse. However, if zn = Op(an) then zn = op(bn) for any bn such that an=bn ! 0:

d

If a random vector converges in distribution zn ! z (for example, if z N (0; V )) then

b

zn = Op(1): It follows that for estimators which satisfy the convergence of Theorem 2.10.4 then we can write

b

p

(n 1=2):

= + O

There are many simple rules for manipulating op(1) and Op(1) sequences which can be deduced from the continuous mapping theorem or Slutsky’ Theorem. For example,

op(1) + op(1)

=

op(1)

op(1)

+ Op(1)

=

Op(1)

Op(1)

+ Op(1)

=

Op(1)

op(1)op(1)

=

op(1)

op(1)Op(1)

=

op(1)

Op(1)Op(1)

=

Op(1)

2.12Uniform Stochastic Bounds*

For some applications it can be useful to obtain the stochastic order of the random variable

max jyij :

1 i n

This is the magnitude of the largest observation in the sample fy1; :::; yng: If the support of the distribution of yi is unbounded, then as the sample size n increases, the largest observation will also tend to increase. It turns out that there is a simple characterization.

Theorem 2.12.1 If Ejyjr < 1 then as n ! 1

n 1=r max

y

p

0

1

i

 

n j

 

ij !

 

 

 

 

 

 

 

CHAPTER 2. MOMENT ESTIMATION

19

Equivalently,

 

1maxi n jyij = op(n1=r):

(2.12)

 

 

Theorem 2.12.1 says that the largest observation will diverge at a rate slower than n1=r. As r increases this rate decreases. Thus the higher the moment, the slower the rate of divergence of the largest observation.

To simplify the notation, we write (2.12) as

yi = op(n1=r)

uniformly in 1 i n. It is important to understand when the Op or op symbols are applied to subscript i random variables we typically mean uniform convergence in the sense of (2.12).

Theorem 2.12.1 applies to random vectors. If Ekykr < 1 then

 

 

 

 

 

 

 

 

 

 

1maxi n kyik = op(n1=r):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

We now prove Theorem 2.12.1. Take any : The event max1 i n jyij > n1=r

means that at

least one of the

yi

 

exceeds n

1=r

; which is the same as the

event

 

n

 

y

 

 

> n1=r

 

or equivalently

j

 

 

 

 

i=1

j

 

i

j

 

 

 

S

n

r

 

rj

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i=1 fjyij

 

> ng : Since the probability of the union of

events is smaller than the sum of the

 

 

 

S

 

 

 

 

 

 

 

 

probabilities,

 

 

 

 

1maxi n jyij >

 

 

 

 

fjyijr

> rng!

 

 

 

 

 

 

 

 

 

 

 

 

Pr n 1=r

=

Pr

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

i=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Xi

Pr (jyijr > n r)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

Xi

E(jyijr 1 (jyijr > n r))

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n r

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=1r E(jyijr 1 (jyijr > n r))

where the second inequality is the strong form of Markov’s inequality (Theorem B.25) and the …nal equality is since the yi are iid. Since Ejyjr < 1 this …nal expectation converges to zero as n ! 1: This is because Z

Ejyijr = jyjr dF (y) < 1

implies Z

E(jyijr 1 (jyijr > c)) = jyjr dF (y) ! 0

jyjr>c

1=r p

as c ! 1: We have established that n max1 i n jyij ! 0; as required.

2.13 Semiparametric E¢ ciency

In this section we argue that the sample mean and plug-in estimator = g ( ) are e¢ cient

estimators of the parameters and

. Our

demonstration is based on the rich but technically

 

bounds. An excellent accessibleb

review has been

challenging theory of semiparametric e¢ ciency

b

b

provided by Newey (1990). We will also appeal to the asymptotic theory of maximum likelihood estimation (see Section B.11).

We start by examining the sample mean ; for the asymptotic e¢ ciency of will follow from

b

b

b

that of :

CHAPTER 2. MOMENT ESTIMATION

20

p

 

Recall, we know that if Ekyk2 < 1 then the sample mean has the asymptotic distribution

 

b

d

b

 

n ( ) ! N (0; V ) : We want to know if is the best feasible estimator, or if there is another

estimator with a smaller asymptotic variance. While it seems intuitively unlikely that another estimator could have a smaller asymptotic variance, how do we know that this is not the case?

When we ask if b is the best estimator, we need to be clear about the class of models –the class

of permissible distributions. For estimation of the mean of the distribution of y the broadest

conceivable class is L = fF : Ekyk < 1g : This class is too broad n for our current purposes, as

1 n o

b

: A more realistic choice is L2

= F : Ekyk2 < 1

is not asymptotically N (0; V ) for all F 2 L1

–the class of …nite-variance distributions. When we seek an e¢ cient estimator of the mean in the class of models L2 what we are seeking is the best estimator, given that all we know is that

F 2 L2:

To show that the answer is not immediately obvious, it might be helpful to review a set-

ting where the sample mean is ine¢ cient. Suppose that y 2 R has the double exponential den-

sity f (y )

=

2 1=2 exp

y

 

 

p

2

: Since var (y) = 1 we see that the sample mean sat-

 

 

 

 

 

 

 

 

j

 

j

is…es p

 

j(^

 

)

d

N (0; 1).

In this model the maximum likelihood estimator (MLE) ~ for

n

 

 

 

 

!

 

 

 

 

 

 

 

is the sample median. Recall from the theory of maximum likelhood that the MLE satis…es

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

p

 

d

 

2

1

 

@

p

=

p2 sgn (y

 

) is the score. We can

 

 

 

 

n (~ ) ! N2 0;

 

ES

 

where S =

@ log f (y j )

 

 

 

 

 

 

d

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

calculate that ES = 2 and thus conclude that n (~ ) ! N (0; 1=2) : The asymptotic variance

of the MLE is one-half that of the sample mean. Thus when the true density is known to be double exponential the sample mean is ine¢ cient.

But the estimator which achieves this improved e¢ ciency –the sample median –is not generically consistent for the population mean. It is inconsistent if the density is asymmetric or skewed.

So the improvement comes at a great cost. Another way of looking at this is that the sample

median is e¢ cient in the class of densities

f (y

j

) = 2 1=2 exp

y

 

p

 

but unless it is

2

 

 

 

 

j

 

j

known that this is the correct distribution

class this knowledge is not very useful.

 

 

 

 

 

 

 

 

 

The relevant question is whether or not the sample mean is e¢ cient when the form of the distribution is unknown. We call this setting semiparametric as the parameter of interest (the mean) is …nite dimensional while the remaining features of the distribution are unspeci…ed. In the semiparametric context an estimator is called semiparametrically e¢ cient if it has the smallest asymptotic variance among all semiparametric estimators.

The mathematical trick is to reduce the semiparametric model to a set of parametric “submodels”. The Cramer-Rao variance bound can be found for each parametric submodel. The variance bound for the semiparametric model (the union of the submodels) is then de…ned as the supremum

of the individual variance bounds.

 

 

 

 

 

 

 

 

 

Formally, suppose that the true density of y is the unknown function f(y) with mean = Ey =

 

yf(y)dy: A parametric submodel for f(y) is a density f (y j ) which is a smooth function of

a parameter , and there is a true value

0

such that f

 

(y

j

 

0

) = f(y): The index indicates the

R

 

 

 

 

 

submodels. The equality f (y j 0) = f(y) means that the submodel class passes through the true density, so the submodel is a true model. The class of submodels and parameter 0 depend on

the true density f: In the submodel f (y ) ; the mean is ( ) =

R

yf (y

j

) dy which varies

with the parameter . Let 2 @ be the classj

of all submodels for f:

 

 

Since each submodel is parametric we can calculate the e¢ ciency bound for estimation of

within this submodel. Speci…cally, given the density f (y j ) its likelihood score is

 

 

@

log f (y j 0) ;

 

 

 

 

 

S =

 

 

 

 

 

 

@

 

 

 

 

 

so the Cramer-Rao lower bound for estimation of is

ES S0

 

1

: De…ning M =

@

( 0)0;

 

@

 

 

 

 

 

 

 

 

 

by Theorem B.11.5 the Cramer-Rao lower bound for

estimation of within the submodel is

 

 

 

 

 

 

 

V = M0 ES S0 1 M .

 

 

 

 

 

 

 

CHAPTER 2. MOMENT ESTIMATION

21

As V is the e¢ ciency bound for the submodel class f (y j ) ; no estimator can have an asymptotic variance smaller than V for any density f (y j ) in the submodel class, including the true density f. This is true for all submodels : Thus the asymptotic variance of any semiparametric estimator cannot be smaller than V for any conceivable submodel. Taking the supremum of the Cramer-Rao bounds lower from all conceivable submodels we de…ne2

V = sup V :

2@

The asymptotic variance of any semiparametric estimator cannot be smaller than V , since it cannot be smaller than any individual V : We call V the semiparametric asymptotic variance bound or semiparametric e¢ ciency bound for estimation of , as it is a lower bound on the asymptotic variance for any semiparametric estimator. If the asymptotic variance of a speci…c semiparametric estimator equals the bound V we say that the estimator is semiparametrically e¢ cient.

For many statistical problems it is quite challenging to calculate the semiparametric variance bound. However, in some cases there is a simple method to …nd the solution. Suppose that we can …nd a submodel 0 whose Cramer-Rao lower bound satis…es V 0 = V where V is the asymptotic variance of a known semiparametric estimator. In this case, we can deduce that V = V 0 = V . Otherwise there would exist another submodel 1 whose Cramer-Rao lower bound satis…es V 0 < V 1 but this would imply V < V 1 which contradicts the Cramer-Rao Theorem.

We now …nd this submodel for the sample mean : Our goal is to …nd a parametric submodel

whose Cramer-Rao bound for is V : This can be done by creating a tilted version of the true

density. Consider the parametric submodel

b

 

 

f (y j ) = f(y) 1 + 0V 1 (y )

(2.13)

where f(y) is the true density and = Ey: Note that

 

Z

f (y j ) dy = Z f(y)dy + 0V 1 Z f(y) (y ) dy = 1

 

and for all close to zero f (y j ) 0: Thus f (y j ) is a valid density function. It is a parametric submodel since f (y j 0) = f(y) when 0 = 0: This parametric submodel has the mean

Z

( ) = yf (y j ) dy

Z Z

=yf(y)dy + f(y)y (y )0 V 1 dy

=+

which is a smooth function of :

 

 

 

 

 

 

 

 

 

 

 

Since

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V 1 (y )

 

 

 

@

log f

(y

j

) =

@

log

1 +

V

 

1

(y

 

) =

 

 

@

 

 

 

1 + 0V 1 (y )

 

 

 

@

0

 

 

 

 

 

it follows that the score function for is

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@

log f (y j 0) = V

1 (y ) :

(2.14)

 

 

 

 

 

S =

 

 

 

 

 

 

@

By Theorem B.11.3 the Cramer-Rao lower bound for is

 

 

 

 

 

E S S0 1 = V 1E (y ) (y )0 V 1 1 = V :

(2.15)

2 It is not obvious that this supremum exists, as V is a matrix so there is not a unique ordering of matrices. However, in many cases (including the ones we study) the supremum exists and is unique.

CHAPTER 2. MOMENT ESTIMATION

22

The Cramer-Rao lower bound for ( ) = + is also V , and this equals the asymptotic variance of the moment estimator b: This was what we set out to show.

In summary, we have shown that in the submodel (2.13) the Cramer-Rao lower bound for estimation of is V which equals the asymptotic variance of the sample mean. This establishes the following result.

Proposition 2.13.1 In the class of distributions F 2 L2; the semiparametric variance bound for estimation of is V = var(yi); and the sample mean b is a semiparametrically e¢ cient estimator of the population mean

.

We call this result a proposition rather than a theorem as we have not attended to the regularity

 

 

 

conditions.

 

 

 

 

 

 

 

 

 

 

p

 

 

 

 

 

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

It is a simple matter to extend this result to the plug-in estimator = g

( ). We know from

 

 

 

Theorem 2.10.4 that if Ekyk

2

<

1 and g

(u) is

continuously di¤erentiable at u = then the plug-in

 

 

 

 

 

 

 

 

 

 

 

d

 

 

b

 

 

 

 

 

 

estimator has the asymptotic distribution

 

 

 

 

! N (0; G0V G) : We therefore consider the

 

 

 

2

 

n

 

 

 

 

class of distributions

L

2(g) =

 

F : E

k

y

k

 

<

 

b; g (u) is continuously di¤erentiable at u = Ey :

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

2

2

 

 

 

 

 

 

 

 

where =

 

y

 

and

 

=

 

y

 

then

 

(g) =

and

 

y

= 0 :

For example, if = 1= 2

 

1

2

E

2

L2

F : Ey1

 

 

E

 

 

n

1

E

 

 

 

 

 

 

 

 

 

 

< 1; Ey2 < 1; o

2

6

For any submodel the Cramer-Rao lower bound for estimation of = g ( ) is G0V G by

 

 

 

Theorem B.11.5. For the submodel (2.13) this bound is G0V G which equals the asymptotic variance

 

 

 

of b

 

 

 

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

from Theorem 2.10.4. Thus is semiparametrically e¢ cient.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Proposition 2.13.2 In the class of distributions F 2 L2(g) the semipara-

 

 

 

 

 

 

metric variance bound for estimation of = g ( ) is G0V G; and the plug-in

 

 

 

 

 

 

 

= g ( ) is a semiparametrically e¢ cient estimator of .

 

 

 

 

 

 

estimator b

 

 

 

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The result in Proposition 2.13.2 is quite general. Smooth functions of sample moments are e¢ cient estimators for their population counterparts. This is a very powerful result, as most econometric estimators can be written (or approximated) as smooth functions of sample means.

2.14Expectation*

For any random variable y we de…ne the mean or expectation Ey as follows. If y is discrete,

X1

Ey = j Pr (y = j) ;

j=1

and if y is continuous with density f

Z 1

Ey = yf(y)dy:

1

We can unify these de…nitions by writing the expectation as the Lebesgue integral with respect to the distribution function F Z 1

Ey = ydF (y):

1

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]