Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

1Foundation of Mathematical Biology / Foundation of Mathematical Biology Statistics Lecture 3-4

.pdf
Скачиваний:
46
Добавлен:
15.08.2013
Размер:
248.78 Кб
Скачать

Bayes’ theorem

Let A and

B1, B2 ,..., Bn be events where

B are disjoint and

 

 

 

i

 

exhaustive (cover all sample space),

and

P(Bi ) > 0

for all i, then

 

 

 

 

 

prior knowledge of Bj

 

P(Bj | A) =

P( A|B j )P(B j )

 

 

n

 

 

 

 

 

P( A|B j )P(B j )

 

distribution of

Bj

j =1

 

 

 

 

 

 

 

conditioned on

A

 

 

 

 

Example F: polygraph tests/lie detector tests (discrete sample space)

Events:

L: subject is lying

T: subject is telling truth

+: polygraph reading is positive (indicating that the subject is lying)

-: polygraph reading negative (indicating that the subject is telling truth) Polygraph reliability Æ conditional probability (conditioned on L and T)

 

L

T

 

 

 

 

 

+

0.88

0.14

 

-

 

 

 

0.12

0.86

 

 

 

 

 

 

 

One specific screen, prior P(T)= 0.99

P(L)= 0.01

What is P(T|+), the prob. that the reading is positive but the subj. is telling truth?

P(T | +) =

P(+|T ) P(T )

=

0.14×0.99

= 0.94

P(+|T )P(T )+ P(+|L) P(L)

0.14×0.99+0.88×0.01

Bayesian inference for continuous parameters

prior distribution

P(θ | Data) =

P(Data|θ )π (θ )

P(Data|θ )π (θ )dθ

 

posterior distribution

Example A revisited

posterior distribution

conjugate prior: same functional form as the likelihood function

P(m | p) = pm (1 p)N m

P( p | m) =

 

P(m| p)π ( p)

 

 

 

P(m| p)π ( p)dp

 

 

 

 

 

 

π ( p) =

Γ (α + β )

p

α 1

(1 p)

β 1

Beta distribution

Γ (α )Γ (β )

 

 

P( p | m) = const × pm+α 1 (1 p)N m+ β 1

1 a1

 

b1

 

Γ (a)Γ (b)

Β(a, b) 0x

(1 x)

 

dx =

 

 

Γ (a+b)

average over

p = pP( p | m)dp

posterior dis.

pm+α (1p)N m+ β 1 dp

 

 

=

 

pm+α 1 (1p)N m+ β 1 dp

Γ(x) = (x 1)Γ(x 1)

=

m+α

N +α + β

pseudo counts

Generalize to T alphabet

Multinomial

v

v

 

T

m

T

T

P(m | p) = pi

 

i

mi = N

pi = 1

 

 

 

 

i=1

 

 

i=1

i=1

 

 

 

 

T

 

 

 

 

Dirichlet distr

π ( pv) = piα i 1

 

 

 

 

 

 

 

i=1

 

 

 

 

 

pi

=

 

mi +α i

 

 

 

 

 

 

T

 

 

 

 

 

 

 

 

(mi +α i )

 

 

i =1

Example C: Segmentation Revisited

a sequence of head (1) and tail (0) is generated by first using a coin with p1 and then change to a coin with p2 the change point unknown

Data = (001010000000010111101111100010)

Infer the distribution of parameters as well as missing data

P(seq | p , p

2

, x) = p h1 ( x) (1 p )t1 ( x)

p

h2 ( x) (1 p

2

)t2 ( x)

1

1

1

 

2

 

x

position right before the change

 

 

h (x)

number of heads up to x.

 

t1 (x) number of tails up to x

1

 

 

 

 

 

 

 

h (x)

number of heads after x.

 

t2 (x) number of tails after x

2

 

 

 

 

 

 

 

N

total number of tosses

In the Bayesian approach, we treat the parameters and the missing data on the same footing

posterior dis.

P( p1, p2 , x | seq)

=

 

P(seq| p1 , p2 ,x)π ( p1 , p2 ,x)

 

∫∫dp1dp2 P(seq| p1

, p2 ,x)π ( p1 , p2 ,x)

 

 

 

 

 

 

 

 

 

 

 

 

x

 

 

 

 

uniform on x

 

 

 

 

 

 

 

 

 

 

prior

π ( p , p

2

, x) = p α1 1

(1 p )β1 1 p α 2 1

(1 p

2

)β 2 1

/(N + 1)

 

1

1

 

1

2

 

 

 

posterior dis. P(x | seq) = ∫∫ dp1dp2 P( p1, p2 , x | seq) of x

P(x | seq) =

Β(h1 (x) + α1, t1 (x) + β1 )Β(h2 (x) + α 2 , t2 (x) + β 2 )

Β(h1 (x) + α1, t1(x) + β1)Β(h2 (x) + α 2 , t2 (x) + β 2 )

 

x

Problem sets

1. Suppose that x1, x2 ,..., xN are

independent and identically distributed (i.i.d) sample drawn from a normal distribution N (µ ,σ 2 )

a) show that the maximum likelihood estimate (MLE) for the mean and variance is given by the following formula

N

 

µˆ = x = xi / N

 

i=1

 

N

 

σˆ 2 = (xi x)2 / N

 

i=1

 

b) calculate the mean and variance of ˆ

and σˆ 2

under repeated sampling. Show that the MLE converge

to the true values with 1/ N error

 

hint: recall from my second lecture that

Nσˆ 2 / σ 2 has a

chi-square distribution with (N-1) degrees of freedom

Problem sets

2.Maximum likelihood estimate for the parameters of multinomial distribution. Consider N independent trials, each can result in one of

T possible outcomes, with probabilities

p1, p2 ,..., pT

 

 

 

 

the observed numbers for the T possible outcomes are

m1, m2 ,..., mT

calculate the MLEs for the probabilities.

 

 

 

 

 

hint: write down the likelihood function

v v

T

 

m

 

 

 

 

P(m | p) = constpi

i

T

 

 

i=1

 

 

pi = 1

and use Lagrangian multiplier to implement the constraint

 

 

 

 

 

i=1

3.Entropic segmentation. Consider the example C in my previous lecture. use the some observed data. Suppose you know that the change point x is

between 12 and 16. Find x that minimize the entropy (maximizing the likelihood)

4.Bayesian inference for a Poisson process. Suppose that four

samples (1,1,0,3) are drawn from a Poisson distribution specified by a mean λ . Assuming a uniform prior distribution for λ .

calculate the posterior distribution of

λ

,

λ

 

ˆ that maximize the

posterior distribution, and λ which is the average over posterior distribution.