Econometrics2011
.pdfCHAPTER 2. MOMENT ESTIMATION |
|
|
|
|
|
13 |
|
elements of y are |
0 y2 |
1 |
|
|
|
|
|
|
|
|
|
|
|||
|
y1 |
|
|
|
|
|
|
|
B y |
m |
C |
: |
|
|
|
|
B |
C |
|
|
|
||
|
y = B |
... |
C |
|
|
|
|
|
@ |
|
A |
|
|
|
|
The population mean of y is just the vector of marginal means |
|||||||
|
0 |
E(y1) |
1 |
|
|||
|
E(y2) |
|
|||||
|
B |
E |
(y |
m |
) |
C |
|
|
B |
... |
|
C |
: |
||
|
= E(y) = B |
|
|
|
C |
||
|
@ |
|
|
|
|
A |
|
When working with random vectors y it is convenient to measure their magnitude with the
Euclidean norm
kyk = y12 + + ym2 1=2 :
This is the classic Euclidean length of the vector y. Notice that kyk2 = y0y:
It turns out that it is equivalent to describe …niteness of moments in terms of the Euclidean norm of a vector or all individual components.
Theorem 2.7.1 For y 2 Rm; Ekyk < 1 if and only if Ejyjj < 1 for j = 1; :::; m:
Theorem 2.7.1 implies that the components of are …nite if and only if Ekyk < 1. The m m variance matrix of y is
V = var (y) = E (y ) (y )0 |
|
: |
|
|
V is often called a variance-covariance matrix. You can show |
that the elements of V |
are …nite if |
||
|
|
|
Ekyk2 < 1:
A random sample fy1; :::; yng consists of n observations of independent and identically draws from the distribution of y: (Each draw is an m-vector.) The vector sample mean
|
|
|
|
0 |
|
|
|
1 |
1 |
|
|
|
|
|
y |
||||
|
|
1 n |
|
|
|
2 |
|||
|
|
y |
|||||||
y = n i=1 yi = B ... |
C |
||||||||
|
|
|
X |
B |
|
|
|
|
C |
|
|
|
|
|
|
|
|||
|
|
|
|
y |
m |
||||
|
|
|
|
B |
C |
||||
|
|
|
|
@ |
|
|
|
|
A |
is the vector of means of the individual variables.
Convergence in probability of a vector is de…ned as convergence in probability of all elements
|
|
|
p |
p |
|||||||
in the vector. Thus |
y |
! if and only if |
y |
j ! j for j = 1; :::; m: Since the latter holds if |
|||||||
Ejyjj < 1 for j = 1; :::; m; or equivalently Ekyk < 1; we can state this formally as follows. |
|||||||||||
|
|
|
|
|
|
|
|||||
|
Theorem 2.7.2 Weak Law of Large Numbers (WLLN) for random vectors |
|
|||||||||
|
If Ekyk < 1 then as n ! 1, |
|
|
||||||||
|
|
|
|
|
|
1 n |
p |
|
|||
|
|
|
|
|
|
|
Xi |
|
|
||
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|||
|
|
|
y = n =1 yi ! E(yi): |
|
CHAPTER 2. MOMENT ESTIMATION |
14 |
2.8Convergence in Distribution
The WLLN is a useful …rst step, but does not give an approximation to the distribution of an estimator. A large-sample or asymptotic approximation can be obtained using the concept of convergence in distribution.
De…nition 2.8.1 Let zn be a random vector with distribution Fn(u) = Pr (zn u) : We
d
say that zn converges in distribution to z as n ! 1, denoted zn ! z; if for all u at which F (u) = Pr (z u) is continuous, Fn(u) ! F (u) as n ! 1:
d
When zn ! z, it is common to refer to z as the asymptotic distribution or limit distribution of zn.
When the limit distribution z is degenerate (that is, Pr (z = c) = 1 for some c) we can write
d p
the convergence as zn ! c, which is equivalent to convergence in probability, zn ! c.
The typical path to establishing convergence in distribution is through the central limit theorem (CLT), which states that a standardized sample average converges in distribution to a normal random vector.
Theorem 2.8.1 Central Limit Theorem (CLT). If Ekyk2 < 1 then as n ! 1
|
|
|
|
|
|
|
n |
|
|
p |
|
|
|
1 |
Xi |
|
d |
||
|
|
|
|
|
|||||
|
n ( |
y |
n ) = |
p |
|
|
(yi |
) ! N (0; V ) |
|
|
n |
=1 |
|||||||
where = Ey and V = E (y ) (y )0 |
: |
p
The standardized sum zn = n (yn ) has mean zero and variance V . What the CLT adds is that the variable zn is also approximately normally distributed, and that the normal approximation improves as n increases.
The CLT is one of the most powerful and mysterious results in statistical theory. It shows that the simple process of averaging induces normality. The …rst version of the CLT (for the number of heads resulting from many tosses of a fair coin) was established by the French mathematician Abraham de Moivre in 1733. This was extended to cover an approximation to the binomial distribution in 1812 by Pierre-Simon Laplace, and the general statement is credited to the Russian mathematician Aleksandr Lyapunov in 1901.
2.9Functions of Moments
We now expand our investigation and consider estimation of parameters which can be written as a continuous function of . That is, the parameter of interest is the vector of functions
= g ( ) |
(2.5) |
where g : Rm ! Rk: As one example, the geometric mean of wages w is
= exp (E(log (w))) |
(2.6) |
CHAPTER 2. MOMENT ESTIMATION |
15 |
which is (2.5) with
g(u) = exp (u)
and = E(log (w)) : As another example, the skewness of the wage distribution is
sk |
= |
|
|
|
E(w Ew)3 |
3=2 |
|
|
|
|||||
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
E(w E2w)2 3 |
|
|
|
||||||||
where w = wage and |
= g Ew; Ew ; Ew |
|
|
|
||||||||||
|
|
|
|
|
3 3 2 1 + 2 13 |
|
|
|||||||
g ( ; ; ) = |
: |
(2.7) |
||||||||||||
2 12 3=2 |
||||||||||||||
1 |
2 |
3 |
|
|
|
|
|
|||||||
In this case we can set |
|
y = |
|
|
|
|||||||||
|
|
0 w2 |
1 |
|
|
|
|
|
|
|||||
|
|
|
|
|
|
w |
A |
|
|
|
|
|
|
|
so that |
|
|
|
|
@ w3 |
|
|
|
|
|
|
|||
|
|
|
|
|
Ew |
|
|
|
|
|
|
|
||
|
|
|
|
@ |
A |
|
|
|
|
|
(2.8) |
|||
|
|
|
|
Ew3 |
|
|
|
|
|
|||||
|
= |
0 |
Ew2 |
1 |
: |
|
|
|
|
The parameter = g ( ) is not a population moment, so it does not have a direct moment estimator. Instead, it is common to use a plug-in estimate formed by replacing the unknown with its point estimate b so that
b b
= g ( ) :
b
Again, the hat “^”indicates that is a sample estimate of : For example, the plug-in estimate of the geometric mean
is
b = exp(b)
(2.9)
of the wage distribution from (2.6)
with
1 Xn
b = n i=1 log (wagei) :
The plug-in estimate of the skewness of the wage distribution is
|
|
1 |
|
P |
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
1 |
|
n |
|
(wi w) |
3 |
|
|||||||
sk = |
|
|
|
n |
|
i=1 |
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
c |
|
|
|
|
n |
|
|
|
|
|
3 |
|
2 |
|
3=2 |
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
(wi w) |
|
|
|
|||||||||
b |
n |
|
|
i=1 |
|
|
|
|
|||||||||
|
|
|
|
|
b b |
|
|
b |
|
|
|
|
|
|
|||
= |
|
3 |
|
P3 |
1 |
+ 2 |
1 |
|
|
|
|
|
|||||
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
||||
|
|
|
b |
|
b |
2 |
|
3=2 |
|
|
|
|
|
|
|
||
|
|
|
|
2 1 |
|
|
|
|
|
|
|
|
|
where
bj = n1 Xn wij: i=1
A useful property is that continuous functions are limit-preserving.
p
Theorem 2.9.1 Continuous Mapping Theorem (CMT). If zn ! c
p
as n ! 1 and g ( ) is continuous at c; then g(zn) ! g(c) as n ! 1.
CHAPTER 2. MOMENT ESTIMATION |
16 |
The proof of Theorem 2.9.1 is given in Section 2.15.
p
For example, if zn ! c as n ! 1 then
p
zn + a ! c + a
p
azn ! ac
zn2 |
p |
! c2 |
as the functions g (u) = u + a; g (u) = au; and g (u) = u2 are continuous. Also
a |
p a |
||
|
! |
|
|
zn |
c |
if c =6 0: The condition c =6 0 is important as the function g(u) = a=u is not continuous at u = 0:
b
We need the following assumption in order for to be consistent for .
Theorem 2.9.2 If Ekyk < 1 and g (u) is continuous at u = then
|
|
b |
p |
|
|
|
|
|
|
= g ( ) ! g ( ) = |
|
as n |
! 1 |
b |
|
|
; and thus isbconsistent for : |
To apply Theorem 2.9.2 it is necessary to check if the function g is continuous at . In our …rst example g(u) = exp (u) is continuous everywhere. It therefore follows from Theorem 2.7.2 and Theorem 2.9.2 that if Ejlog (wage)j < 1 then as n ! 1
|
|
p |
|
|
|
|
|
|
|
! : |
|
|
|
|
|
|
|
In our second example g de…ned in (2.7) isbcontinuous for all |
such that var(w) = |
2 |
|
2 |
> 0; |
|||
3 |
|
1 |
|
|||||
which holds unless w has a degenerate distribution. Thus if Ejwj |
|
< 1 and var(w) > 0 then as |
||||||
n ! 1 |
sk |
p sk: |
|
|
|
|
|
|
|
c |
! |
|
|
|
|
|
|
2.10Delta Method
In this section we introduce two tools –an extended version of the CMT and the Delta Method
b
–which allow us to calculate the asymptotic distribution of the parameter estimate .
We …rst present an extended version of the continuous mapping theorem which allows convergence in distribution.
Theorem 2.10.1 Continuous Mapping Theorem
d m k
If zn ! z as n ! 1 and g : R ! R has the set of discontinuity points
d
Dg such that Pr (z 2 Dg) = 0; then g(zn) ! g(z) as n ! 1.
For a proof of Theorem 2.10.1 see Theorem 2.3 of van der Vaart (1998). It was …rst proved by Mann and Wald (1943) and is therefore sometimes referred to as the Mann-Wald Theorem
Theorem 2.10.1 allows the function g to be discontinuous only if the probability at being at a discontinuity point is zero. For example, the function g(u) = u 1 is discontinuous at u = 0; but if
d |
d |
zn ! z N (0; 1) then Pr (z = 0) = 0 so zn 1 |
! z 1: |
A special case of the Continuous Mapping Theorem is known as Slutsky’ Theorem.
CHAPTER 2. MOMENT ESTIMATION |
17 |
||||||
|
|
|
|
|
|
||
|
Theorem 2.10.2 Slutsky’s Theorem |
|
|||||
|
|
|
d |
|
|
p |
|
|
If zn ! z and cn ! c as n ! 1 then |
|
|||||
|
1. |
|
|
|
|
d |
|
|
zn + cn ! z + c |
|
|||||
|
2. |
|
|
d |
|
||
|
zncn ! zc |
|
|||||
|
|
|
zn |
d z |
|
||
|
3. |
|
|
! |
|
if c 6= 0 |
|
|
|
cn |
c |
|
|||
|
|
|
|
|
|
|
|
Even though Slutsky’ Theorem is a special case of the CMT, it is a useful statement as it focuses on the most common applications –addition, multiplication and division.
Despite the fact that the plug-in estimator is a function of for which we have an asymptotic
distribution, Theorem 2.10.1 does not directlyb give us an |
asymptotic distribution for : This is |
||||||||||||||||||||||
|
standardized sequence pn ( |
) : |
|||||||||||||||||||||
because = |
|
b |
|
|
|
|
|
|
|
|
b |
|
|
b |
|
|
|
|
|
b |
|
||
g ( ) is written as a function of |
, not of the |
|
|
|
|
b |
|||||||||||||||||
We needban intermediate step – a …rst order Taylor series expansion. This step is so critical to |
|||||||||||||||||||||||
statistical theory that it has its own name –The Delta Method. |
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Theorem 2.10.3 Delta Method: |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
If |
p |
|
|
|
d |
|
|
g( ) : R |
m |
k |
; k m; is |
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
||||||||||||||||
|
n ( n 0) ! ; where is m 1; and |
|
! R |
|
|
||||||||||||||||||
|
continuously di¤erentiable in a neighborhood of then as n ! 1 |
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
p |
|
|
|
|
|
d |
|
|
|
|
(2.10) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
a n (g ( n) g( 0)) ! G0 |
|
|
|
|
|
|
|||||||||
|
where G( ) = |
@ |
g( )0 and G = G( 0): In particular, if |
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
||||||||||||||||
|
|
|
|
|
@ |
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
p |
|
|
d |
(0; V ) |
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
n ( n 0) ! N |
|
|
|
|
|
|
|
|
|||||
|
where V is m m; then as n ! 1 |
|
0; G0V G : |
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
d |
|
(2.11) |
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
pn (g ( n) g( 0)) ! N |
|
|
|
|
|
The Delta Method allows us to complete our derivation of the asymptotic distribution of the
b
estimator of . Relative to consistency, it requires the stronger smoothness condition that g( ) is continuously di¤erentiable.
Now by combining Theorems 2.8.1 and 2.10.3 we can …nd the asymptotic distribution of the
b plug-in estimator .
Theorem 2.10.4 If Ekyk2 < 1 and G (u) = |
@ |
g (u)0 is continuous in |
|||||
|
|||||||
@u |
|||||||
a neighborhood of u = then as n ! 1 |
|
|
|
||||
|
|
|
d |
|
|
|
|
pn |
0; G0V G |
||||||
! N |
|||||||
where G = G ( ) : |
b |
|
|
|
|
||
|
|
|
|
|
|
|
CHAPTER 2. MOMENT ESTIMATION |
18 |
2.11Stochastic Order Symbols
It is convenient to have simple symbols for random variables and vectors which converge in probability to zero or are stochastically bounded. The notation zn = op(1) (pronounced “small
p
oh-P-one”) means that zn ! 0 as n ! 1: We also say that zn = op(an)
if an is a sequence such that a 1zn = op(1): For example, for any consistent estimator for we |
||||||
then can write |
|
n |
|
= + op(1) |
|
b |
Similarly, the notation z |
n |
= |
Op(1) |
(pronounced “big on-P-one”) means that z |
n |
is bounded in |
|
|
b |
|
probability. Precisely, for any " > 0 there is a constant M" < 1 such that
lim Pr (jznj > M") ":
n!1
We say that
zn = Op(an)
if an is a sequence such that an 1zn = Op(1):
Op(1) is weaker than op(1) in the sense that zn = op(1) implies zn = Op(1) but not the reverse. However, if zn = Op(an) then zn = op(bn) for any bn such that an=bn ! 0:
d
If a random vector converges in distribution zn ! z (for example, if z N (0; V )) then
b
zn = Op(1): It follows that for estimators which satisfy the convergence of Theorem 2.10.4 then we can write
b |
p |
(n 1=2): |
= + O |
There are many simple rules for manipulating op(1) and Op(1) sequences which can be deduced from the continuous mapping theorem or Slutsky’ Theorem. For example,
op(1) + op(1) |
= |
op(1) |
|
op(1) |
+ Op(1) |
= |
Op(1) |
Op(1) |
+ Op(1) |
= |
Op(1) |
op(1)op(1) |
= |
op(1) |
|
op(1)Op(1) |
= |
op(1) |
|
Op(1)Op(1) |
= |
Op(1) |
2.12Uniform Stochastic Bounds*
For some applications it can be useful to obtain the stochastic order of the random variable
max jyij :
1 i n
This is the magnitude of the largest observation in the sample fy1; :::; yng: If the support of the distribution of yi is unbounded, then as the sample size n increases, the largest observation will also tend to increase. It turns out that there is a simple characterization.
Theorem 2.12.1 If Ejyjr < 1 then as n ! 1
n 1=r max |
y |
p |
0 |
|||
1 |
i |
|
n j |
|
ij ! |
|
|
|
|
|
|
|
CHAPTER 2. MOMENT ESTIMATION |
19 |
Equivalently, |
|
1maxi n jyij = op(n1=r): |
(2.12) |
|
|
Theorem 2.12.1 says that the largest observation will diverge at a rate slower than n1=r. As r increases this rate decreases. Thus the higher the moment, the slower the rate of divergence of the largest observation.
To simplify the notation, we write (2.12) as
yi = op(n1=r)
uniformly in 1 i n. It is important to understand when the Op or op symbols are applied to subscript i random variables we typically mean uniform convergence in the sense of (2.12).
Theorem 2.12.1 applies to random vectors. If Ekykr < 1 then
|
|
|
|
|
|
|
|
|
|
1maxi n kyik = op(n1=r): |
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We now prove Theorem 2.12.1. Take any : The event max1 i n jyij > n1=r |
means that at |
||||||||||||||||||||||||
least one of the |
yi |
|
exceeds n |
1=r |
; which is the same as the |
event |
|
n |
|
y |
|
|
> n1=r |
|
or equivalently |
|||||||||||
j |
|
|
|
|
i=1 |
j |
|
i |
j |
|
|
|
||||||||||||||
S |
n |
r |
|
rj |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
i=1 fjyij |
|
> ng : Since the probability of the union of |
events is smaller than the sum of the |
||||||||||||||||||||||
|
|
|
S |
|
|
|
|
|
|
|
|
|||||||||||||||
probabilities, |
|
|
|
|
1maxi n jyij > |
|
|
|
|
fjyijr |
> rng! |
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
Pr n 1=r |
= |
Pr |
n |
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
[ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
i=1 |
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
Pr (jyijr > n r) |
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
Xi |
E(jyijr 1 (jyijr > n r)) |
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
n r |
=1 |
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=1r E(jyijr 1 (jyijr > n r))
where the second inequality is the strong form of Markov’s inequality (Theorem B.25) and the …nal equality is since the yi are iid. Since Ejyjr < 1 this …nal expectation converges to zero as n ! 1: This is because Z
Ejyijr = jyjr dF (y) < 1
implies Z
E(jyijr 1 (jyijr > c)) = jyjr dF (y) ! 0
jyjr>c
1=r p
as c ! 1: We have established that n max1 i n jyij ! 0; as required.
2.13 Semiparametric E¢ ciency
In this section we argue that the sample mean and plug-in estimator = g ( ) are e¢ cient
estimators of the parameters and |
. Our |
demonstration is based on the rich but technically |
||
|
bounds. An excellent accessibleb |
review has been |
||
challenging theory of semiparametric e¢ ciency |
b |
b |
provided by Newey (1990). We will also appeal to the asymptotic theory of maximum likelihood estimation (see Section B.11).
We start by examining the sample mean ; for the asymptotic e¢ ciency of will follow from
b |
b |
b |
that of : |
CHAPTER 2. MOMENT ESTIMATION |
20 |
|||
p |
|
Recall, we know that if Ekyk2 < 1 then the sample mean has the asymptotic distribution |
||
|
b |
d |
b |
|
|
n ( ) ! N (0; V ) : We want to know if is the best feasible estimator, or if there is another |
estimator with a smaller asymptotic variance. While it seems intuitively unlikely that another estimator could have a smaller asymptotic variance, how do we know that this is not the case?
When we ask if b is the best estimator, we need to be clear about the class of models –the class
of permissible distributions. For estimation of the mean of the distribution of y the broadest
conceivable class is L = fF : Ekyk < 1g : This class is too broad n for our current purposes, as
1 n o
b |
: A more realistic choice is L2 |
= F : Ekyk2 < 1 |
is not asymptotically N (0; V ) for all F 2 L1 |
–the class of …nite-variance distributions. When we seek an e¢ cient estimator of the mean in the class of models L2 what we are seeking is the best estimator, given that all we know is that
F 2 L2:
To show that the answer is not immediately obvious, it might be helpful to review a set-
ting where the sample mean is ine¢ cient. Suppose that y 2 R has the double exponential den- |
|||||||||||||
sity f (y ) |
= |
2 1=2 exp |
y |
|
|
p |
2 |
: Since var (y) = 1 we see that the sample mean sat- |
|||||
|
|
|
|
|
|
|
|
j |
|
j |
|||
is…es p |
|
j(^ |
|
) |
d |
N (0; 1). |
In this model the maximum likelihood estimator (MLE) ~ for |
||||||
n |
|||||||||||||
|
|
|
|
! |
|
|
|
|
|
|
|
is the sample median. Recall from the theory of maximum likelhood that the MLE satis…es |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
p |
|
d |
|
2 |
1 |
|
@ |
p |
= |
p2 sgn (y |
|
) is the score. We can |
||||||
|
|
|
|
|||||||||||||||
n (~ ) ! N2 0; |
|
ES |
|
where S = |
@ log f (y j ) |
|
||||||||||||
|
|
|
|
|
d |
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
calculate that ES = 2 and thus conclude that n (~ ) ! N (0; 1=2) : The asymptotic variance
of the MLE is one-half that of the sample mean. Thus when the true density is known to be double exponential the sample mean is ine¢ cient.
But the estimator which achieves this improved e¢ ciency –the sample median –is not generically consistent for the population mean. It is inconsistent if the density is asymmetric or skewed.
So the improvement comes at a great cost. Another way of looking at this is that the sample |
|||||||||
median is e¢ cient in the class of densities |
f (y |
j |
) = 2 1=2 exp |
y |
|
p |
|
but unless it is |
|
2 |
|||||||||
|
|
|
|
j |
|
j |
|||
known that this is the correct distribution |
class this knowledge is not very useful. |
||||||||
|
|
|
|
|
|
|
|
|
The relevant question is whether or not the sample mean is e¢ cient when the form of the distribution is unknown. We call this setting semiparametric as the parameter of interest (the mean) is …nite dimensional while the remaining features of the distribution are unspeci…ed. In the semiparametric context an estimator is called semiparametrically e¢ cient if it has the smallest asymptotic variance among all semiparametric estimators.
The mathematical trick is to reduce the semiparametric model to a set of parametric “submodels”. The Cramer-Rao variance bound can be found for each parametric submodel. The variance bound for the semiparametric model (the union of the submodels) is then de…ned as the supremum
of the individual variance bounds. |
|
|
|
|
|
|
|
|
|
|
Formally, suppose that the true density of y is the unknown function f(y) with mean = Ey = |
||||||||
|
yf(y)dy: A parametric submodel for f(y) is a density f (y j ) which is a smooth function of |
||||||||
a parameter , and there is a true value |
0 |
such that f |
|
(y |
j |
|
0 |
) = f(y): The index indicates the |
|
R |
|
|
|
|
|
submodels. The equality f (y j 0) = f(y) means that the submodel class passes through the true density, so the submodel is a true model. The class of submodels and parameter 0 depend on
the true density f: In the submodel f (y ) ; the mean is ( ) = |
R |
yf (y |
j |
) dy which varies |
|
with the parameter . Let 2 @ be the classj |
of all submodels for f: |
|
|
Since each submodel is parametric we can calculate the e¢ ciency bound for estimation of
within this submodel. Speci…cally, given the density f (y j ) its likelihood score is |
|
|
|||||||
@ |
log f (y j 0) ; |
|
|
|
|
|
|||
S = |
|
|
|
|
|
|
|||
@ |
|
|
|
|
|
||||
so the Cramer-Rao lower bound for estimation of is |
ES S0 |
|
1 |
: De…ning M = |
@ |
( 0)0; |
|||
|
@ |
||||||||
|
|
|
|
|
|
|
|
|
|
by Theorem B.11.5 the Cramer-Rao lower bound for |
estimation of within the submodel is |
||||||||
|
|
|
|
|
|
|
|||
V = M0 ES S0 1 M . |
|
|
|
|
|
|
|
CHAPTER 2. MOMENT ESTIMATION |
21 |
As V is the e¢ ciency bound for the submodel class f (y j ) ; no estimator can have an asymptotic variance smaller than V for any density f (y j ) in the submodel class, including the true density f. This is true for all submodels : Thus the asymptotic variance of any semiparametric estimator cannot be smaller than V for any conceivable submodel. Taking the supremum of the Cramer-Rao bounds lower from all conceivable submodels we de…ne2
V = sup V :
2@
The asymptotic variance of any semiparametric estimator cannot be smaller than V , since it cannot be smaller than any individual V : We call V the semiparametric asymptotic variance bound or semiparametric e¢ ciency bound for estimation of , as it is a lower bound on the asymptotic variance for any semiparametric estimator. If the asymptotic variance of a speci…c semiparametric estimator equals the bound V we say that the estimator is semiparametrically e¢ cient.
For many statistical problems it is quite challenging to calculate the semiparametric variance bound. However, in some cases there is a simple method to …nd the solution. Suppose that we can …nd a submodel 0 whose Cramer-Rao lower bound satis…es V 0 = V where V is the asymptotic variance of a known semiparametric estimator. In this case, we can deduce that V = V 0 = V . Otherwise there would exist another submodel 1 whose Cramer-Rao lower bound satis…es V 0 < V 1 but this would imply V < V 1 which contradicts the Cramer-Rao Theorem.
We now …nd this submodel for the sample mean : Our goal is to …nd a parametric submodel
whose Cramer-Rao bound for is V : This can be done by creating a tilted version of the true |
|||
density. Consider the parametric submodel |
b |
|
|
|
f (y j ) = f(y) 1 + 0V 1 (y ) |
(2.13) |
|
where f(y) is the true density and = Ey: Note that |
|
||
Z |
f (y j ) dy = Z f(y)dy + 0V 1 Z f(y) (y ) dy = 1 |
|
and for all close to zero f (y j ) 0: Thus f (y j ) is a valid density function. It is a parametric submodel since f (y j 0) = f(y) when 0 = 0: This parametric submodel has the mean
Z
( ) = yf (y j ) dy
Z Z
=yf(y)dy + f(y)y (y )0 V 1 dy
=+
which is a smooth function of : |
|
|
|
|
|
|
|
|
|
|
|
|||||||||
Since |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V 1 (y ) |
|
|
|
|
@ |
log f |
(y |
j |
) = |
@ |
log |
1 + |
V |
|
1 |
(y |
|
) = |
|
|
||||
@ |
|
|
|
1 + 0V 1 (y ) |
|
|||||||||||||||
|
|
@ |
0 |
|
|
|
|
|
||||||||||||
it follows that the score function for is |
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
@ |
log f (y j 0) = V |
1 (y ) : |
(2.14) |
||||||||||||
|
|
|
|
|
S = |
|
||||||||||||||
|
|
|
|
|
@ |
|||||||||||||||
By Theorem B.11.3 the Cramer-Rao lower bound for is |
|
|
|
|||||||||||||||||
|
|
E S S0 1 = V 1E (y ) (y )0 V 1 1 = V : |
(2.15) |
2 It is not obvious that this supremum exists, as V is a matrix so there is not a unique ordering of matrices. However, in many cases (including the ones we study) the supremum exists and is unique.
CHAPTER 2. MOMENT ESTIMATION |
22 |
The Cramer-Rao lower bound for ( ) = + is also V , and this equals the asymptotic variance of the moment estimator b: This was what we set out to show.
In summary, we have shown that in the submodel (2.13) the Cramer-Rao lower bound for estimation of is V which equals the asymptotic variance of the sample mean. This establishes the following result.
Proposition 2.13.1 In the class of distributions F 2 L2; the semiparametric variance bound for estimation of is V = var(yi); and the sample mean b is a semiparametrically e¢ cient estimator of the population mean
.
We call this result a proposition rather than a theorem as we have not attended to the regularity |
|
|
|
|||||||||||||||||||||||||||||
conditions. |
|
|
|
|
|
|
|
|
|
|
p |
|
|
|
|
|
|
|
|
|
|
|
b |
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
It is a simple matter to extend this result to the plug-in estimator = g |
( ). We know from |
|
|
|
||||||||||||||||||||||||||||
Theorem 2.10.4 that if Ekyk |
2 |
< |
1 and g |
(u) is |
continuously di¤erentiable at u = then the plug-in |
|
|
|
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
d |
|
|
b |
|
|
|
|
|
|
|||||||||||||||
estimator has the asymptotic distribution |
|
|
|
|
! N (0; G0V G) : We therefore consider the |
|
|
|
||||||||||||||||||||||||
2 |
|
n |
|
|
|
|
||||||||||||||||||||||||||
class of distributions |
L |
2(g) = |
|
F : E |
k |
y |
k |
|
< |
|
b; g (u) is continuously di¤erentiable at u = Ey : |
|
|
|
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
2 |
2 |
|
|
|
|
||||||
|
|
|
|
where = |
|
y |
|
and |
|
= |
|
y |
|
then |
|
(g) = |
and |
|
y |
= 0 : |
||||||||||||
For example, if = 1= 2 |
|
1 |
2 |
E |
2 |
L2 |
F : Ey1 |
|
|
E |
||||||||||||||||||||||
|
|
n |
1 |
E |
|
|
|
|
|
|
|
|
|
|
< 1; Ey2 < 1; o |
2 |
6 |
|||||||||||||||
For any submodel the Cramer-Rao lower bound for estimation of = g ( ) is G0V G by |
|
|
|
|||||||||||||||||||||||||||||
Theorem B.11.5. For the submodel (2.13) this bound is G0V G which equals the asymptotic variance |
|
|
|
|||||||||||||||||||||||||||||
of b |
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
from Theorem 2.10.4. Thus is semiparametrically e¢ cient. |
|
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|
||||||||||||||||||||||||||
|
Proposition 2.13.2 In the class of distributions F 2 L2(g) the semipara- |
|
|
|
|
|
||||||||||||||||||||||||||
|
metric variance bound for estimation of = g ( ) is G0V G; and the plug-in |
|
|
|
|
|
||||||||||||||||||||||||||
|
|
= g ( ) is a semiparametrically e¢ cient estimator of . |
|
|
|
|
|
|||||||||||||||||||||||||
|
estimator b |
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The result in Proposition 2.13.2 is quite general. Smooth functions of sample moments are e¢ cient estimators for their population counterparts. This is a very powerful result, as most econometric estimators can be written (or approximated) as smooth functions of sample means.
2.14Expectation*
For any random variable y we de…ne the mean or expectation Ey as follows. If y is discrete,
X1
Ey = j Pr (y = j) ;
j=1
and if y is continuous with density f
Z 1
Ey = yf(y)dy:
1
We can unify these de…nitions by writing the expectation as the Lebesgue integral with respect to the distribution function F Z 1
Ey = ydF (y):
1