Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

vstatmp_engl

.pdf
Скачиваний:
12
Добавлен:
12.03.2016
Размер:
6.43 Mб
Скачать

13.3 Properties of the Maximum Likelihood Estimator

347

To prove this inequality, we define the random variable y = P yi with

yi =

∂ ln fi

, fi ≡ f(xi|θ) .

∂θ

 

 

It has the expected value

 

 

 

 

 

 

 

 

 

hyii = Z

 

1 ∂f

 

 

 

 

i

fi dxi

 

fi

∂θ

 

= Z

 

∂f

 

 

 

 

i

dxi

 

 

∂θ

 

=

 

Z

fi dxi

 

 

 

∂θ

 

=

 

1 = 0 .

 

∂θ

 

 

 

 

 

 

 

Because of the independence of the yi we have hyiyji = hyiihyj i = 0 and

var(y) = Nhyi2i = N *

∂ ln f

2

+ .

 

 

∂θ

Using the definition L = Q fi, we find for cov(ty) = h(t − hti)(y − hyi)i:

Z

cov(ty) = t ∂θln L L dx1 · · · dxN

Z

= t ∂θ L dx1 · · · dxN

= ∂θhti

= 1 + ddθb .

From the Cauchy–Schwarz inequality

[cov(ty)]2 ≤ var(t)var(y)

(13.7)

(13.8)

(13.9)

(13.10)

and (13.9), (13.10) follows (13.6).

The equality sign in (13.6) is valid if and only if the two factors t, y in the covariance are proportional to each other. In this case t is called a Minimum Variance Bound (MVB) estimator. It can be shown to be also minimal su cient.

In most of the literature e ciency is defined by the stronger condition: An estimator is called e cient, if it is bias-free and if it satisfies the MVB.

13.3 Properties of the Maximum Likelihood Estimator

13.3.1 Consistency

The maximum likelihood estimator (MLE) is consistent under mild assumptions. To prove this, we consider the expected value of

348 13 Appendix

 

N

 

ln L(θ|x) = ln f(xi|θ)

(13.11)

=1

 

Xi

 

which is to be calculated by integration over the variables3 x using the true p.d.f. (with the true parameter θ0). First we prove the inequality

 

 

 

hln L(θ|x)i < hln L(θ0|x)i ,

 

(13.12)

for θ 6= θ0: Since

the logarithm is

a strongly convex function, there is

always

hln(. . .)i < lnh(. . .)i, hence

 

 

 

 

 

ln

L(θ x)

 

L(θ x)

 

L(θ x)

 

 

|

< ln

|

 

= ln Z

|

L(θ0

|x)dx = ln 1 = 0 .

 

L(θ0|x)

L(θ0|x)

L(θ0|x)

 

In the last step we used

ZZ Y

L(θ|x)dx =

f(xi|θ)dx1 · · · dxN = 1 .

P

Since ln L(θ|x)/N = ln f(xi|θ)/N is an arithmetic sample mean which, according to the law of large numbers (13.2), converges stochastically to the expected value for N → ∞, we have also (in the sense of stochastic convergence)

X

ln L(θ|x)/N → hln f(x|θ)i = hln f(xi|θ)i /N = hln L(θ|x)i /N ,

and from (13.12)

Nlim P {ln L(θ|x) < ln L(θ0|x)} = 1 , θ 6= θ0 .

(13.13)

→∞

 

On the other hand, the MLE ˆ is defined by its extremal condition

θ

ˆ

≥ ln L(θ0|x) .

 

 

 

 

 

ln L(θ|x)

 

 

 

 

 

A contradiction to (13.13) can be avoided only, if also

 

 

 

 

 

ˆ

− θ0| < ε} = 1

 

 

 

 

 

Nlim P {|θ

 

 

 

 

 

→∞

 

 

 

 

 

 

is valid. This means consistency of the MLE.

 

 

 

 

 

13.3.2 E ciency

 

 

 

 

 

 

Since the MLE is consistent, it is unbiased asymptotically

for N

→ ∞

. Under certain

 

4

 

 

assumptions in addition to the usually required regularity

 

the MLE is also e cient

asymptotically.

Proof :

3We keep the form of the argument list of L, although now x is not considered as fixed to the experimentally sampled values, but as a random vector with given p.d.f..

4The boundaries of the domain of x must not depend on θ and the maximum of L should not be reached at the boundary of the range of θ.

13.3 Properties of the Maximum Likelihood Estimator

349

expected value and variance of y =

 

yi = ∂ ln L/∂θ Q

 

expressions:

hyi = Z

P

 

 

 

 

 

 

 

 

ln L

 

 

 

 

 

 

 

 

L dx = 0 ,

(13.14)

 

 

∂θ

 

 

 

∂ ln L

 

2

 

2

 

σy2

= var(y) = *

 

 

+

= −

 

ln L .

(13.15)

 

∂θ

∂θ2

With the notations of the last paragraph with L = fi and using (13.8), the are given by the following

The last relation follows after further di erentiation of (13.14) and from the relation

 

2 ln L

 

∂ ln L ∂L

∂ ln L ∂ ln L

 

Z

 

L dx = − Z

 

 

 

dx = − Z

 

 

 

L dx .

 

∂θ2

∂θ

∂θ

∂θ ∂θ

From the Taylor expansion of ∂ ln L/∂θ|θ=θˆ which is zero by definition and with

(13.15) we find

 

 

 

 

 

 

 

 

 

∂ ln L

|θ=θˆ

∂ ln L

|θ=θ0

ˆ

− θ0)

2 ln L

|θ=θ0

0 =

 

 

+ (θ

 

∂θ

∂θ

∂θ2

 

 

 

ˆ

2

 

 

(13.16)

 

 

≈ y − (θ − θ0y ,

 

 

where the consistency of the MLE guaranties the validity of this approximation in the sense of stochastic convergence. Following the central limit theorem, y/σy being the sum of i.i.d. variables, is asymptotically normally distributed with mean zero and

variance unity. The same is then true for

ˆ

θ

 

ˆ

follows

asymptotically a

0

θ

 

 

y, i.e.

 

2

normal distribution with mean θ0 and asymptotically vanishing variance 1/σy 1/N,

as seen from (13.9).

 

 

 

 

 

 

 

 

13.3.3 Asymptotic Form of the Likelihood Function

 

 

 

 

 

 

 

 

 

 

ˆ

A similar result as derived in the last paragraph for the p.d.f. of the MLE θ can be

derived for the likelihood function itself.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ˆ

If one considers the Taylor expansion of y = ∂ ln L/∂θ around the MLE θ, we get

ˆ

 

 

 

 

 

 

 

 

with y(θ) = 0

 

 

 

 

 

 

 

 

y(θ) ≈ (θ − θˆ)yˆ) .

 

 

(13.17)

As discussed in the last paragraph, we have for N → ∞

ˆ → h i − 2

y (θ) y (θ0) y = σy = const .

Thus ˆ is independent of ˆ and higher derivatives disappear. After integration of y (θ) θ

(13.17) over θ we obtain a parabolic form for ln L:

 

ˆ

1

2

ˆ

2

ln L(θ) = ln L(θ) −

2

σy

(θ − θ) ,

where the width of the parabola decreases with σy−2 1/N (13.9). Up to the missing normalization, the likelihood function has the same form as the distribution of the

ˆ

− θ0

ˆ

MLE with θ

replaced by θ − θ.

350 13 Appendix

13.3.4 Properties of the Maximum Likelihood Estimate for Small Samples

The criterium of asymptotic e ciency, fulfilled by the MLE for large samples, is usually extended to small samples, where the normal approximation of the sampling distribution does not apply, in the following way: A bias-free estimate t is called a minimum variance (MV) estimate if var(t) ≤ var(t) for any other bias-free estimate t. If, moreover, the Cramer–Rao inequality (13.6) is fulfilled as an equality, one speaks of a minimum variance bound (MVB) estimate, often also called e cient or most e cient, estimate (not to be confused with the asymptotic e ciency which we have considered before in Appendix 13.2). The latter, however, exists only for a certain function τ(θ) of the parameter θ if it has a one-dimensional su cient statistic (see 7.1.1). It can be shown [2] that under exactly this condition the MLE for τ will be this MVB estimate, and therefore bias-free for any N. The MLE for any nonlinear function of τ will in general be biased, but still optimal in the following sense: if bias-corrected, it becomes an MV estimate, i.e. it will have the smallest variance among all unbiased estimates.

Example 148. : E ciency of small sample MLEs

The MLE for the variance σ2

of a normal distribution with known mean µ,

 

σ2

1

X(xi − µ)2 ,

 

=

 

 

N

is unbiased and e cient,

reaching the MVB for all N. The MLE for σ is of

c

 

 

 

course

 

 

 

c

σˆ = qσ2

,

according to the relation between σ and σ2. It is biased and thus not e cient in the sense of the above definition.

A bias-corrected estimator for σ is (see for instance [90])

r

2

 

 

Γ

N2

 

 

 

 

N

 

Γ

N

 

 

 

 

 

 

 

 

2

 

 

 

σˆcorr =

 

 

 

 

+1

 

σˆ .

 

 

 

 

 

This estimator can be shown to have the smallest variance of all unbiased estimators, independent of the sample size N.

In the above example a one-dimensional su cient statistic exists. If this is not the case, the question of optimality of the MLE for small samples has – from the frequentist point of view – no general answer.

In summary, also for finite N the MLE for a certain parameter achieves the optimal – from the frequentist point of view – properties of an MVB estimator, if the latter does exist. Of course these properties cannot be preserved for other parameterizations, since variance and bias are not invariant properties.

13.4 Error of Background-Contaminated Parameter Estimates

In order to calculate the additional uncertainty of a parameter estimate due to the presence of background, if the latter is taken from a reference experiment in the

13.4 Error of Background-Contaminated Parameter Estimates

351

way described in Sect. 6.5.10, we consider the general definition of the pseudo loglikelihood

 

N

M

ln L˜

Xi

X

=

ln f(xi|θ) − r ln f(xi|θ) ,

 

=1

i=1

restricting ourselves at first to a single parameter θ, see (6.20). The generalization to multi-dimensional parameter spaces is straight forward, and will be indicated later.

 

˜

 

 

 

 

 

 

 

 

 

 

From ∂ ln L/∂θ|θˆ = 0, we find

 

 

 

 

 

 

 

"

S

 

∂ ln f(x(S) θ)

B

(x(B) θ)

M

∂ ln f(x θ)

 

 

X

 

 

X

 

Xi

 

 

 

θ

 

 

∂θi |

∂ ln f∂θi |

∂θ i|

 

= 0 .

i=1

 

+

− r

 

#

 

 

 

i=1

 

=1

 

 

 

ˆ

 

 

 

 

 

 

 

 

ˆ

 

 

 

 

This formula defines the background-corrected estimate θ. It di ers from the “ideal”

estimate θ

 

which would be obtained in the absence of background, i.e. by equating

ˆ(S)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ˆ

ˆ(S)

+

ˆ

in the first sum,

to zero the first sum on the left hand side. Writing θ = θ

θ

and Taylor expanding it up to the first order, we get

S

2 ln f(xi(S)|θ)

 

 

(S)

Xi

|θˆ

=1

∂θ2

 

 

θˆ + "

B

(x(B) θ)

M

∂ ln f(x θ)

#

 

Xi

 

 

X

 

θ

∂ ln f∂θi | − r

∂θ i|

=1

i=1

= 0 . (13.18)

 

 

 

 

 

ˆ

The first sum, if taken with a minus sign, is the Fisher information of the signal

sample on θ

(S)

 

 

 

 

ˆ(S)

), asymptotically. The approximation relies

 

, and equals −1/var(θ

on the assumption that

ln f(x

 

θ) is parabolic in the region

ˆ(S)

 

ˆ

 

θ

 

θ. Then we

have

 

 

PB

 

i|

(B)

 

θ)

 

M

∂ ln f(x

θ)

# .

±

 

θˆ ≈ var(θˆ(S)) "

∂ ln f(x

 

 

 

 

 

 

 

 

i

 

|

 

− r

 

i|

 

 

(13.19)

 

 

 

 

 

 

 

 

 

 

∂θ

 

 

 

i=1

∂θ

 

 

 

 

 

=1

 

 

 

 

 

 

 

 

 

ˆ

 

 

 

 

 

Xi

 

 

 

 

 

 

 

X

 

 

θ

 

 

We take the expected value with respect to the background distribution and obtain

h

θˆ

= var(θˆ(S)) B

rM

ih

∂ ln f(x|θ)

|θˆi

.

i

h −

 

∂θ

 

Since hB − rMi = 0, the background correction is asymptotically bias-free. Squaring (13.19), and writing the summands in short hand as yi, yi, we get

( θˆ)2 = (var(θˆ(S)))2

" B

yi

− r M

yi#2

,

 

 

 

 

 

 

 

 

 

 

Xi

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=1

 

i=1

 

 

 

 

 

 

 

 

 

 

 

B

B

 

 

M

M

 

 

 

 

B

M

 

 

 

 

]2

Xi

X

 

 

X X

 

 

 

 

X X

 

 

[

· · ·

=

y y

j

+ r2

 

yy

2r

 

y y

 

 

 

 

 

i

 

 

i

j

 

 

 

i j

 

 

 

 

 

 

j

 

 

i

j

 

 

 

 

i

j

 

 

 

 

 

B

 

 

M

 

B

 

 

 

 

M

B

M

 

 

 

 

X

 

X

 

X

 

 

 

 

X

X X

 

 

 

 

= yi2 + r2

yi′2 + yiyj + r2

 

yiyj− 2r

yiyj,

 

 

 

i

 

 

i

 

j6=i

 

 

 

 

j6=i

i

j

 

 

 

 

= (var(θˆ(S)))2 hB + r2Mih(y2i − hyi2) + h(B − rM)2ihyi2

. (13.20)

In physics experiments, the event numbers M, B, S are independently fluctuating according to Poisson distributions with expected values hMi = hBi/r, and hSi. Then hB + r2Mi = hBi(1 + r) and

h(B − rM)2i = hB2i + hr2M2i − 2rhBihMi = hBi + r2hMi = hBi(1 + r) .

352 13 Appendix

Adding the contribution from the uncontaminated estimate, var ˆ(S) , to (13.20)

(θ )

leads to the final result

 

 

D

E

 

 

 

 

 

 

ˆ

ˆ(S)

) + (

ˆ 2

 

 

 

 

 

 

var(θ) = var(θ

θ)

)) hBihy

 

i

 

 

= var(θ

) + (1 + r)(var(θ

 

(13.21)

 

ˆ(S)

 

ˆ(S) 2

 

2

 

 

 

 

ˆ(S)

 

ˆ(S) 2

hMihy

2

i .

 

= var(θ

) + r(1 + r)(var(θ

))

 

But hMi, the expected number of background events in both the mainand the control sample, is not known and can only be estimated by the empirical value M. In the same way we have to use empirical estimates for the expected values hyi and hy2i, since the distribution of background events is unknown. Thus we replace

M

M

Xi

X

hMi → M , hyi →

yi/M , hy2i → yi2/M

=1

i=1

where yi = ∂ ln f(xi|θ)/∂θ. As usual in error calculation, the dependence of yi on

the true value of θ has to be approximated by a dependence on the estimated value

ˆ

ˆ(S)

):

θ. Similarly, we approximate var(θ

− var ˆ(S)

1/ (θ ) =

S

 

2 ln f(xi|θ)

 

 

 

 

 

 

 

(S)

 

 

Xi

∂θ2

|θˆ

 

 

=1

 

 

 

 

 

 

 

N

2 ln f(x θ)

M

2 f(x θ)

 

X

 

 

Xi

 

θ

∂θ2 i|

 

∂ ln∂θ2 i|

"

 

 

 

− r

# .

i=1

 

 

=1

 

ˆ

We realize from (13.21) that it is advantageous to take a large reference sam-

ple, i.e.

r

small. The variance h

ˆ

2

i

increases with the square of the error of the

( θ)

 

 

 

 

 

2

i

it depends also on the shape of the

uncontaminated sample. Via the quantity hy

 

background distribution.

For P -dimensional parameter space θ we see from (13.18) that the first sum is given by the weight matrix V(S) of the estimated parameters in the absence of background

P S

2 ln f(xi(S)|θ)

 

(S)

 

|θˆ

 

∂θk∂θl

 

 

=1 i=1

 

 

 

Xl

X

 

 

 

 

P

 

 

 

ˆ

Xl

(S)

 

ˆ

θl =

(V

 

)kl

θl .

 

=1

 

 

 

Solving the linear equation system for

ˆ

θ and constructing from its components the

error matrix E, we find in close analogy to the one-dimensional case

E = C(S)YC(S) ,

with C(S) = V(S)−1 being the covariance matrix of the background-free estimates and Y defined as

Ykl = r(1 + r)hMihykyli ,

with yk = yk(xi) short hand for ∂ ln f(xi|θ)/∂θk. As in the one-dimensional case, the total covariance matrix of the estimated parameters is the sum

ˆ ˆ

(S)

+ Ekl .

cov(θk, θl) = Ckl

The following example illustrates the error due to background contamination for the above estimation method.

13.5 Frequentist Confidence Intervals

353

Example 149. Parameter uncertainty for background contaminated signals

We investigate how well our asymptotic error formula works in a specific example. To this end, we consider a Gaussian signal distribution with width unity and mean zero over a background modeled by an exponential distribution with decay constant γ = 0.2 of the form c exp[−γ(x + 4)] where both distributions are restricted to the range [−4, 4]. The numbers of signal events S, background events B and reference events M follow Poisson distributions with mean values hSi = 60, hBi = 40 and hMi = 100. This implies a correction factor r = hBi/hMi = 0.4 for the reference experiment. From 104 MC experiments we obtain a distribution of µˆ, with mean value and width

0.019 and 0.34, respectively. The pure signal µˆ(S) has mean and width 0.001

and 0.13 (= 1/ 60). From our asymptotic error formula (13.21) we derive an error of 0.31, slightly smaller than the MC result. The discrepancy will be larger for lower statistics. It is typical for Poisson fluctuations.

13.5 Frequentist Confidence Intervals

We associate error intervals to measurements to indicate that the parameter of interest has a reasonably high probability to be located inside the interval. However to compute the probability a prior probability has to be introduced with the problem which we have discussed in Sect. 6.1. To circumvent this problem, J. Neyman has proposed a method to construct intervals without using prior probabilities. Unfortunately, as it is often the case, one problem is traded for another one.

Neyman’s confidence intervals have the following defining property: The true pa-

rameter lies in the interval on the average in the fraction C of intervals of confidence level C. In other words: Given a true value θ, a measurement t will include it in its associated confidence interval [t1, t2] – “cover” it – with probability C. (Remark that this does not necessarily imply that given a certain confidence interval the true value is included in it with probability C.)

Traditionally chosen values for the confidence level are 68.3%, 90%, 95% – the former corresponds to the standard error interval of the normal distribution.

Confidence intervals are constructed in the following way:

For each parameter value θ a probability interval [t1(θ), t2(θ)] is defined, such that the probability that the observed value t of θ is located in the interval is equal to the confidence level C:

t

 

 

 

P {t1(θ) ≤ t ≤ t2(θ)} = Zt1

2

f(t|θ)dt = C .

(13.22)

Of course the p.d.f. f(t|θ) or error distribution of the estimator t must be known. To fix the interval completely, an additional condition is applied. In the univariate case, a common procedure is to choose central intervals,

P {t < t1} = P {t > t2} = 1 − C .

2

Other conventions are minimum length and equal probability intervals defined by f(t1) = f(t2). The confidence interval consists of those parameter values which include the measurement tˆ within their probability intervals. Somewhat simplified: Parameter values are accepted, if the observation is compatible with them.

354

13 Appendix

 

 

 

 

 

Θ

 

 

 

 

 

8

 

 

 

 

 

 

 

t1HΘL

 

 

 

6

 

 

 

t2HΘL

 

 

 

 

 

 

Θmax

 

 

 

 

 

4

 

 

 

 

 

Θmin

 

 

 

 

 

2

 

 

 

 

 

 

 

t=4

 

t

 

2

4

6

8

 

10

Fig. 13.1. Confidence belt. The shaded area is the confidence belt, consisting of the probability intervals [t1(θ), t2(θ)] for the estimator t. The observation t = 4 leads to the confidence interval [θmin , θmax].

The one-dimensional case is illustrated in Fig. 13.1. The pair of curves t = t1(θ) , t = t2(θ) in the (t, θ)-plane comprise the so-called confidence belt . To the measurement tˆ = 4 then corresponds the confidence interval [θmin, θmax] obtained

by inverting the relations t1,2max,min) = tˆ, i.e. the section of the straight line t = tˆ parallel to the θ axis.

The construction shown in Fig. 13.1 is not always feasible: It has to be assumed that t1,2(θ) are monotone functions. If the curve t1(θ) has a maximum say at θ = θ0, then the relation t1(θ) = tˆ cannot always be inverted: For tˆ > t10) the confidence belt degenerates into a region bounded from below, while for tˆ < t10) there is no unique solution. In the first case one usually declares a lower confidence bound as an infinite interval bounded from below. In the second case one could construct a set of disconnected intervals, some of which may be excluded by other arguments.

The construction of the confidence contour in the two-parameter case is illustrated in Fig. 13.2 where for simplicity the parameter and the observation space are chosen such that they coincide. For each point θ1, θ2 in the parameter space we fix a probability contour which contains a measurement of the parameters with probability C. Those parameter points with probability contours passing through the actual

ˆ

ˆ

are located at the confidence contour. All parameter pairs located

measurement θ1

, θ2

inside the shaded area contain the measurement in their probability region.

Frequentist statistics avoids prior probabilities. This feature, while desirable in general, can have negative consequences if prior information exists. This is the case if the parameter space is constrained by mathematical or physical conditions. In frequentist statistics it is not possible to exclude un-physical parameter values without introducing additional complications. Thus, for instance, a measurement could lead for a mass to a 90% confidence interval which is situated completely in the negative region, or for an angle to a complex angular region. The problem is mitigated somewhat by a newer method [78], but not without introducing other complications [79], [80].

13.6 Comparison of Di erent Inference Methods

355

Fig. 13.2. Confidence interval. The shaded area is the confidence region for the two-

dimensional measurement (ˆ ˆ ). The dashed curves indicate probability regions associated

θ1,θ2

to the locations denoted by capital letters.

13.6 Comparison of Di erent Inference Methods

13.6.1 A Few Examples

Before we compare the di erent statistical philosophies let us look at a few examples.

Example 150. Coverage: performance of magnets

A company produces magnets which have to satisfy the specified field strength within certain tolerances. The various measurements performed by the company are fed into a fitting procedure producing 99% confidence intervals which are used to accept (if they are inside the tolerances) or reject the product before sending it o . The client is able to repeat the measurement with high precision and accepts only magnets within the agreed specification. To calculate the price the company must rely on the condition that the confidence interval in fact covers the true value with the presumed confidence level.

Example 151. Bias in the measurements for a mass determination

e−(dθ)2/(2s2)eθ/λ

356 13 Appendix

The mass of a particle is determined from the momenta and the angular configuration of its decay products. The mean value of the masses from many events is reported. The momenta of the charged particles are measured by means of a spectrometer consisting of a magnet and tracking chambers. In this configuration, the χ2 fit of the absolute values of track momenta and consequently also the mass estimates are biased. This bias, which can be shown to be positive, propagates into the mean value. Here a bias in the momentum fit has to be corrected for, because it would lead to a systematic shift of the resulting average of the mass values.

Example 152. Inference with known prior

We repeat an example presented in Sect. 6.2.2. In the reconstruction of a specific, very interesting event, for instance providing experimental evidence for a new particle, we have to infer the distance θ between the production and decay vertices of an unstable particle produced in the reaction. From its momentum and its known mean life we calculate its expected decay length λ. The prior density for the actual decay length θ is π(θ) = exp(−θ/λ)/λ. The experimental distance measurement which follows a Gaussian with standard deviation s yields d. According to (6.2.2), the p.d.f. for the actual distance is given by

f(θ|d) = R e−(dθ)2/(2s2)eθ/λ.

0

This is an ideal situation. We can determine the mean value and the standard deviation or the mode of the θ distribution and an asymmetric error interval with well defined probability content, for instance 68.3%. The confidence level is of no interest and due to the application of the prior the estimate of θ is biased, but this is irrelevant.

Example 153. Bias introduced by a prior

We now modify and extend our example. Instead of the decay length we discuss the lifetime of the particle. The reasoning is the same, we can apply the prior and determine an estimate and an error interval. We now study N decays, to improve our knowledge of the mean lifetime τ of the particle species. For each individual decay we use a prior with an estimate of τ as

known from previous experiments, determine each time the lifetime ti and the

mean value t =

 

ˆ

 

ti/N from all measurements. Even though the individual

¯

 

 

ˆ

time estimates

are improved by applying the prior the average t is a very bad

 

P

ˆ

 

 

 

¯

estimate of τ because the ti are biased towards low values and consequently also their mean value is shifted. (Remark that in this and in the third example we have two types of parameters which we have to distinguish. We discuss the e ect of a bias a icting the primary parameter set, i.e. λ, respectively τ).

Example 154. Comparing predictions with strongly di ering accuracies: Earth quake

Two theories H1, H2 predict the time θ of an earth quake. The predictions di er in the expected values as well as in the size of the Gaussian errors:

H1 : θ1 = (7.50 ± 2.25) h ,

H2 : θ2 = (50 ± 100) h .

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]