Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

33

Log Di¤erences

A useful approximation for the natural logarithm for small x is

log (1 + x) x:

(3.4)

This can be derived from the in…nite series expansion of log (1 + x) :

 

x2

x3

x4

log (1 + x) = x

 

+

 

 

 

+

2

3

4

= x + O(x2):

The symbol O(x2) means that the remainder is bounded by Ax2 as x ! 0 for some A < 1: A plot of log (1 + x) and the linear approximation x is shown in the following …gure. We can see that log (1 + x) and the linear approximation x are very close for jxj 0:1, and reasonably close for jxj 0:2, but the di¤erence increases with jxj.

 

0.4

 

 

 

0.2

 

 

-0.4

-0.2

0.2

0.4

 

 

 

x

 

-0.2

 

 

 

-0.4

 

 

 

log(1 + x)

 

 

Now, if y is c% greater than y; then

y = (1 + c=100)y:

Taking natural logarithms,

log y = log y + log(1 + c=100)

or

log y log y = log(1 + c=100) 100c

where the approximation is (3.4). This shows that 100 multiplied by the di¤erence in logarithms is approximately the percentage di¤erence between y and y , and this approximation is quite good for jcj 10%:

3.4Conditional Expectation Function

An important determinant of wage levels is education. In many empirical studies economists measure educational attainment by the number of years of schooling, and we will write this variable as education7.

7 Here, education is de…ned as years of schooling beyond kindergarten. A high school graduate has education=12, a college graduate has education=16, a Master’ degree has education=18, and a professional degree (medical, law or

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

34

The conditional mean of log wages given gender, race, and education is a single number for each category. For example

E(log(wage) j gender = man; race = white; education = 12) = 2:84

We display in Figure 3.4 the conditional means of log(wage) for white men and white women as a function of education. The plot is quite revealing. We see that the conditional mean is increasing in years of education, but at a di¤erent rate for schooling levels above and below nine years. Another striking feature of Figure 3.4 is that the gap between men and women is roughly constant for all education levels. As the variables are measured in logs this implies a constant average percentage gap between men and women regardless of educational attainment.

 

4.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3.5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Hour

 

 

 

 

 

 

 

 

 

per

3.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dollars

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Log

2.5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.0

 

 

 

 

 

white men

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

white women

 

4

6

8

10

12

14

16

18

20

Years of Education

Figure 3.4: Mean Log Wage as a Function of Years of Education

In many cases it is convenient to simplify the notation by writing variables using single characters, typically y; x and/or z. It is conventional in econometrics to denote the dependent variable (e.g. log(wage)) by the letter y; a conditioning variable (such as gender) by the letter x; and multiple conditioning variables (such as race, education and gender) by the subscripted letters x1; x2; :::; xk.

Conditional expectations can be written with the generic notation

E(y j x1; x2; :::; xk) = m(x1; x2; :::; xk):

We call this the conditional expectation function (CEF). The CEF is a function of (x1; x2; :::; xk) as it varies with the variables. For example, the conditional expectation of y = log(wage) given (x1; x2) = (gender; race) is given by the six entries of Table 3.1. The CEF is a function of (gender; race) as it varies across the entries.

For greater compactness, we will typically write the conditioning variables as a vector in Rk :

0 1 x1

BB x2 CC

x = B . C: (3.5)

@ .. A

xk

PhD) has education=20.

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

35

Here we follow the convention of using lower case bold italics x to denote a vector. Given this notation, the CEF can be compactly written as

E(y j x) = m (x) :

Sometimes, it is useful to notationally distinguish E(y j x) as the CEF evaluated at the random vector x from E(y j x = x0) as the CEF evaluated at the …xed value x0: (And it is mathematically correct to do so.) The …rst expression E(y j x) is a random variable and the second expression E(y j x = x0) is a function. We will not always enforce this distinction as it can become notationally burdensome. Hopefully, the use of E(y j x) should be apparent from the context.

3.5Continuous Variables

In the previous sections, we implicitly assumed that the conditioning variables are discrete. However, many conditioning variables are continuous. In this section, we take up this case and assume that the variables (y; x) are continuously distributed with a joint density function f(y; x):

As an example, take y = log(wage) and x = experience, the number of years of labor market experience. The contours of their joint density are plotted on the left side of Figure 3.5 for the population of white men with 12 years of education.

 

4.0

 

 

 

 

 

 

3.5

 

 

 

 

 

Log Dollars per Hour

3.0

 

 

 

 

Wage Conditional Density

 

2.5

 

 

 

 

Log

 

2.0

 

 

 

 

 

 

0

10

20

30

40

50

 

 

 

 

 

 

 

Exp=5

 

 

 

 

 

 

 

Exp=10

 

 

 

 

 

 

 

Exp=25

 

 

 

 

 

 

 

Exp=40

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Labor Market Experience (Years)

Log Dollars per Hour

Figure 3.5: Left: Joint density of log(wage) and experience and conditional mean of log(wage) given experience for white men with education=12. Right: Conditional densities of log(wage) for white men with education=12.

Given the joint density f(y; x) the variable x has the marginal density

fx(x) = ZR f(y; x)dy:

 

For any x such that fx(x) > 0 the conditional density of y given x is de…ned as

 

fyjx (y j x) =

f(y; x)

(3.6)

 

:

fx(x)

The conditional density is a slice of the joint density f(y; x) holding x …xed. We can visualize this by slicing the joint density function at a speci…c value of x parallel with the y-axis. For example,

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

36

take the density contours on the left side of Figure 3.5 and slice through the contour plot at a speci…c value of experience. This gives us the conditional density of log(wage) for white men with 12 years of education and this level of experience. We do this for four levels of experience (5, 10, 25, and 40 years), and plot these densities on the right side of Figure 3.5. We can see that the distribution of wages shifts to the right and becomes more di¤use as experience increases from 5 to 10 years, and from 10 to 25 years, but there is little change from 25 to 40 years experience.

The CEF of y given x is the mean of the conditional density (3.6)

Z

m (x) = E(y j x) = yfyjx (y j x) dy: (3.7)

R

Intuitively, m (x) is the mean of y for the idealized subpopulation where the conditioning variables are …xed at x. This is idealized since x is continuously distributed so this subpopulation is in…nitely small.

In Figure 3.5 the CEF of log(wage) given experience is plotted as the solid line. We can see that the CEF is a smooth but nonlinear function. The CEF is initially increasing in experience, ‡attens out around experience = 30, and then decreases for high levels of experience.

3.6Law of Iterated Expectations

An extremely useful tool from probability theory is the law of iterated expectations. An important special case is the known as the Simple Law.

Theorem 3.6.1 Simple Law of Iterated Expectations

If Ejyj < 1 then for any random vector x,

E(E(y j x)) = E(y)

The simple law states that the expectation of the conditional expectation is the unconditional expectation. In other words, the average of the conditional averages is the unconditional average.

When x is discrete

X1

E(E(y j x)) = E(y j x j) Pr (x = x j)

j=1

and when x is continuous

Z

 

E(E(y j x)) =

Rk E(y j x ) fx(x)dx:

Going back to our investigation of average log wages for men and women, the simple law states that

E(log(wage) j gender = man) Pr (gender = man)

+E(log(wage) j gender = woman) Pr (gender = woman)

= E(log(wage)) :

Or numerically,

3:05 0:57 + 2:79 0:43 = 2:92:

The general law of iterated expectations allows two sets of conditioning variables.

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

37

 

 

 

 

Theorem 3.6.2 Law of Iterated Expectations

 

 

If Ejyj < 1 then for any random vectors x1 and x2,

 

 

E(E(y j x1; x2) j x1) = E(y j x1)

 

 

 

 

Notice the way the law is applied. The inner expectation conditions on x1 and x2, while the outer expectation conditions only on x1: The iterated expectation yields the simple answer E(y j x1) ; the expectation conditional on x1 alone. Sometimes we phrase this as: “The smaller information set wins.”

As an example

E(log(wage) j gender = man; race = white) Pr (race = whitejgender = man) +E(log(wage) j gender = man; race = black) Pr (race = blackjgender = man) +E(log(wage) j gender = man; race = other) Pr (race = otherjgender = man)

= E(log(wage) j gender = man)

or numerically

3:07 0:84 + 2:86 0:08 + 3:05 0:08 = 3:05

A property of conditional expectations is that when you condition on a random vector x you can e¤ectively treat it as if it is constant. For example, E(x j x) = x and E(g (x) j x) = g (x) for any function g( ): The general property is known as the conditioning theorem.

Theorem 3.6.3 Conditioning Theorem

 

If

 

Ejg (x) yj < 1

(3.8)

then

 

E(g (x) y j x) = g (x) E(y j x)

(3.9)

and

 

E(g (x) y) = E(g (x) E(y j x))

(3.10)

 

 

The proofs of Theorems 3.6.1, 3.6.2 and 3.6.3 are given in Section 3.30.

3.7Monotonicity of Conditioning

What is the e¤ect of increasing the amount of information when constructing a conditional expectation? That is, how do we compare E(y j x1) versus E(y j x1; x2)? We have seen that by increasing the conditioning set, the conditional expectation reveals greater detail about the distribution of y: Is there something more that can be said?

It turns out that there is a simple relationship induced by conditioning. We can think of the conditional mean E(y j x1) as the “explained portion” of y: The remainder y E(y j x1) is the “unexplained portion”. The simple relationship we now derive shows that the variance of this unexplained portion decreases when we condition on more variables. This relationship is monotonic in the sense that increasing the amont of information always decreases the variance of the unexplained portion.

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

38

 

 

 

 

Theorem 3.7.1 If Ey2 < 1 then

 

 

var (y) var (y E(y j x1)) var (y E(y j x1; x2))

 

 

 

 

Theorem 3.7.1 says that the variance of the di¤erence between y and its conditional mean (weakly) decreases whenever an additional variable is added to the conditioning information.

The proof of Theorem 3.7.1 is given in Section 3.30.

3.8CEF Error

The CEF error e is de…ned as the di¤erence between y and the CEF evaluated at the random vector x:

e = y m(x):

By construction, this yields the formula

y = m(x) + e:

(3.11)

In (3.11) it is useful to understand that the error e is derived from the joint distribution of (y; x); and so its properties are derived from this construction.

A key property of the CEF error is that it has a conditional mean of zero. To see this, by the linearity of expectations, the de…nition m(x) = E(y j x) and the Conditioning Theorem

E(e j x) = E((y m(x)) j x)

=E(y j x) E(m(x) j x)

=m(x) m(x)

=0:

This fact can be combined with the law of iterated expectations to show that the unconditional mean is also zero.

E(e) = E(E(e j x)) = E(0) = 0

We state this and some other results formally.

Theorem 3.8.1 Properties of the CEF error

If Ejyj < 1 then

1.E(e j x) = 0:

2.E(e) = 0:

3. If Ejyjr < 1 for r 1 then Ejejr < 1:

4. For any function h (x) such that Ejh (x) ej < 1 then E(h (x) e) = 0

The proof of the third result is deferred to Section 3.30:

The fourth result, whose proof is left to Exercise 3.3, says that e is uncorrelated with any function of the regressors.

The equations

y = m(x) + e

E(e j x) = 0:

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

39

together imply that m(x) is the CEF of y given x. It is important to understand that this is not a restriction. These equations hold true by de…nition.

The condition E(e j x) = 0 is implied by the de…nition of e as the di¤erence between y and the CEF m (x) : The equation E(e j x) = 0 is sometimes called a conditional mean restriction, since the conditional mean of the error e is restricted to equal zero. The property is also sometimes called mean independence, for the conditional mean of e is 0 and thus independent of x. However, it does not imply that the distribution of e is independent of x: Sometimes the assumption “e is independent of x”is added as a convenient simpli…cation, but it is not generic feature of the conditional mean. Typically and generally, e and x are jointly dependent, even though the conditional mean of e is zero.

As an example, the contours of the joint density of e and experience are plotted in Figure 3.6 for the same population as Figure 3.5. The error e has a conditional mean of zero for all values of experience, but the shape of the conditional distribution varies with the level of experience.

 

1.0

 

 

 

 

 

 

0.5

 

 

 

 

 

e

0.0

 

 

 

 

 

 

−0.5

 

 

 

 

 

 

−1.0

 

 

 

 

 

 

0

10

20

30

40

50

Labor Market Experience (Years)

Figure 3.6: Joint density of CEF error e and experience for white men with education=12.

As a simple example of a case where x and e are mean independent yet dependent, let y = xu where x and u are independent and Eu = 1: Then

E(y j x) = E(xu j x) = xE(u j x) = x

so the CEF equation is

y = x + e

where

e = x(u 1):

Note that even though e is not independent of x;

E(e j x) = E(x(u 1) j x) = xE((u 1) j x) = 0

and is thus mean independent.

An important measure of the dispersion about the CEF function is the unconditional variance of the CEF error e: We write this as

2 = var (e) = E (e Ee)2 = E e2 :

Theorem 3.8.1.3 implies the following simple but useful result.

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

40

 

 

 

 

Theorem 3.8.2 If Ey2 < 1 then 2 < 1

 

3.9Best Predictor

Suppose that given a realized value of x, we want to create a prediction or forecast of y: We can write any predictor as a function g (x) of x. The prediction error is the realized di¤erence y g(x): A non-stochastic measure of the magnitude of the prediction error is the expectation of its square

E(y g (x))2 :

(3.12)

We can de…ne the best predictor as the function g (x) which minimizes (3.12). What function is the best predictor? It turns out that the answer is the CEF m(x). This holds regardless of the joint distribution of (y; x):

To see this, note that the mean squared error of a predictor g (x) is

E(y g (x))2 = E(e + m (x) g (x))2

=Ee2 + 2E(e (m (x) g (x))) + E(m (x) g (x))2

=Ee2 + E(m (x) g (x))2

Ee2 = E(y m (x))2

where the …rst equality makes the substitution y = m(x) + e and the third equality uses Theorem 3.8.1.4. The right-hand-side after the third equality is minimized by setting g (x) = m (x), yielding the …nal inequality. The minimum is …nite under the assumption Ey2 < 1 as shown by Theorem 3.8.2.

We state this formally in the following result.

Theorem 3.9.1 Conditional Mean as Best Predictor

If Ey2 < 1; then for any predictor g (x),

E(y g (x))2 E(y m (x))2

where m (x) = E(y j x).

3.10Conditional Variance

While the conditional mean is a good measure of the location of a conditional distribution, it does not provide information about the spread of the distribution. A common measure of the dispersion is the conditional variance.

De…nition 3.10.1 If Ey2 < 1; the conditional variance of y given x is

2(x) = var (y j x)

=E (y E(y j x))2

=E e2 j x

j x

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

41

Generally, 2 (x) is a non-trivial function of x and can take any form subject to the restriction

that it is non-negative. The conditional standard deviation is its square root (x) =

 

 

 

 

 

2

(x):

One way to think about 2(x) is that it is the conditional mean of e2 given x.

p

 

 

As an example of how the conditional variance depends on observables, compare the conditional log wage densities for men and women displayed in Figure 3.3. The di¤erence between the densities is not purely a location shift, but is also a di¤erence in spread. Speci…cally, we can see that the density for men’s log wages is somewhat more spread out than that for women, while the density for women’s wages is somewhat more peaked. Indeed, the conditional standard deviation for men’s wages is 3.05 and that for women is 2.81. So while men have higher average wages, they are also somewhat more dispersed.

The unconditional error variance and the conditional variance are related by the law of iterated

expectations

2 = E e2 = E E e2 j x = E 2(x) :

That is, the unconditional error

variance is the average conditional variance.

 

 

 

 

 

 

Given the conditional variance, we can de…ne a rescaled error

 

 

 

" =

 

e

:

 

(3.13)

 

 

(x)

 

 

 

 

 

 

 

We can calculate that since (x) is a function of x

E(" j x) = E

 

e

j x

=

 

 

1

E(e j x) = 0

 

 

 

(x)

 

(x)

 

 

 

and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

var (" j x) = E "2 j x = E

e2

 

j x =

1

E

e2 j x =

2

(x)

= 1:

2(x)

2(x)

2

(x)

Thus " has a conditional mean of zero, and a conditional variance of 1.

Notice that (3.13) can be rewritten as

e = (x)":

and substituting this for e in the CEF equation (3.11), we …nd that

y = m(x) + (x)":

(3.14)

This is an alternative (mean-variance) representation of the CEF equation.

Many econometric studies focus on the conditional mean m(x) and either ignore the conditional variance 2(x); treat it as a constant 2(x) = 2; or treat it as a nuisance parameter (a parameter not of primary interest). This is appropriate when the primary variation in the conditional distribution is in the mean, but can be short-sighted in other cases. Dispersion is relevant to many economic topics, including income and wealth distribution, economic inequality, and price dispersion. Conditional dispersion (variance) can be a fruitful subject for investigation.

The perverse consequences of a narrow-minded focus on the mean has been parodied in a classic joke:

An economist was standing with one foot in a bucket of boiling water and the other foot in a bucket of ice. When asked how he felt, he replied, “On average I feel just …ne.”

Clearly, the economist in question ignored variance!

CHAPTER 3. CONDITIONAL EXPECTATION AND PROJECTION

42

3.11Homoskedasticity and Heteroskedasticity

An important special case obtains when the conditional variance 2(x) is a constant and independent of x. This is called homoskedasticity.

De…nition 3.11.1 The error is homoskedastic if E e2 j x = 2 does not depend on x.

In the general case where 2(x) depends on x we say that the error e is heteroskedastic.

De…nition 3.11.2 The error is heteroskedastic if E e2 j x = 2(x) depends on x.

It is helpful to understand that the concepts homoskedasticity and heteroskedasticity concern the conditional variance, not the unconditional variance. By de…nition, the unconditional variance is a constant and independent of the regressors x. So when we talk about the variance as a function of the regressors, we are talking about the conditional variance. Recall Figure 3.3 and how the variance of wages varies between men and women.

Some older or introductory textbooks describe heteroskedasticity as the case where “the variance of e varies across observations”. This is a poor and confusing de…nition. It is more constructive to understand that heteroskedasticity means that the conditional variance 2 (x) depends on observables.

Older textbooks also tend to describe homoskedasticity as a component of a correct regression speci…cation, and describe heteroskedasticity as an exception or deviance. This description has in‡uenced many generations of economists, but it is unfortunately backwards. The correct view is that heteroskedasticity is generic and “standard”, while homoskedasticity is unusual and exceptional. The default in empirical work should be to assume that the errors are heteroskedastic, not the converse.

In apparent contradiction to the above statement, we will still frequently impose the homoskedasticity assumption when making theoretical investigations into the properties of estimation and inference methods. The reason is that in many cases homoskedasticity greatly simpli…es the theoretical calculations, and it is therefore quite advantageous for teaching and learning. It should always be remembered, however, that homoskedasticity is never imposed because it is believed to be a correct feature of an empirical model, but rather because of its simplicity.

3.12Regression Derivative

One way to interpret the CEF m(x) = E(y j x) is in terms of how marginal changes in the regressors x imply changes in the conditional mean of the response variable y: It is typical to consider marginal changes in single regressors, holding the remainder …xed. When a regressor x1 is continuously distributed, we de…ne the marginal e¤ect of a change in x1, holding the variables x2; :::; xk …xed, as the partial derivative of the CEF

@

@x1 m(x1; :::; xk):

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]