1Foundation of Mathematical Biology / Foundation of Mathematical Biology Statistics Lecture 3-4
.pdfBayes’ theorem
Let A and |
B1, B2 ,..., Bn be events where |
B are disjoint and |
|||
|
|
|
i |
|
|
exhaustive (cover all sample space), |
and |
P(Bi ) > 0 |
for all i, then |
||
|
|
|
|
|
prior knowledge of Bj |
|
P(Bj | A) = |
P( A|B j )P(B j ) |
|
||
|
n |
|
|
|
|
|
|
∑P( A|B j )P(B j ) |
|
||
distribution of |
Bj |
j =1 |
|
|
|
|
|
|
|
||
conditioned on |
A |
|
|
|
|
Example F: polygraph tests/lie detector tests (discrete sample space)
Events:
L: subject is lying
T: subject is telling truth
+: polygraph reading is positive (indicating that the subject is lying)
-: polygraph reading negative (indicating that the subject is telling truth) Polygraph reliability Æ conditional probability (conditioned on L and T)
|
L |
T |
|
|
|
|
|
+ |
0.88 |
0.14 |
|
- |
|
|
|
0.12 |
0.86 |
|
|
|
|
||
|
|
|
|
One specific screen, prior P(T)= 0.99 |
P(L)= 0.01 |
What is P(T|+), the prob. that the reading is positive but the subj. is telling truth?
P(T | +) = |
P(+|T ) P(T ) |
= |
0.14×0.99 |
= 0.94 |
P(+|T )P(T )+ P(+|L) P(L) |
0.14×0.99+0.88×0.01 |
Bayesian inference for continuous parameters
prior distribution
P(θ | Data) = |
P(Data|θ )π (θ ) |
|
∫ P(Data|θ )π (θ )dθ |
||
|
posterior distribution
Example A revisited
posterior distribution
conjugate prior: same functional form as the likelihood function
P(m | p) = pm (1 − p)N −m
P( p | m) = |
|
P(m| p)π ( p) |
|
|
|
|||
∫ P(m| p)π ( p)dp |
|
|
||||||
|
|
|
|
|||||
π ( p) = |
Γ (α + β ) |
p |
α −1 |
(1 − p) |
β −1 |
Beta distribution |
||
Γ (α )Γ (β ) |
|
|
P( p | m) = const × pm+α −1 (1 − p)N −m+ β −1
1 a−1 |
|
b−1 |
|
Γ (a)Γ (b) |
Β(a, b) ≡ ∫0x |
(1 − x) |
|
dx = |
|
|
Γ (a+b) |
average over |
p = ∫ pP( p | m)dp |
|
posterior dis. |
∫ pm+α (1− p)N − m+ β −1 dp |
|
|
= |
|
|
∫ pm+α −1 (1− p)N − m+ β −1 dp |
Γ(x) = (x − 1)Γ(x − 1)
=
m+α
N +α + β
pseudo counts
Generalize to T alphabet
Multinomial |
v |
v |
|
T |
m |
T |
T |
|
P(m | p) = ∏ pi |
|
i |
∑mi = N |
∑ pi = 1 |
||||
|
|
|
|
i=1 |
|
|
i=1 |
i=1 |
|
|
|
|
T |
|
|
|
|
Dirichlet distr |
π ( pv) = ∏ piα i −1 |
|
|
|
|
|||
|
|
|
i=1 |
|
|
|
|
|
|
pi |
= |
|
mi +α i |
|
|
|
|
|
|
T |
|
|
|
|
||
|
|
|
|
∑(mi +α i ) |
|
|
i =1
Example C: Segmentation Revisited
a sequence of head (1) and tail (0) is generated by first using a coin with p1 and then change to a coin with p2 the change point unknown
Data = (001010000000010111101111100010)
Infer the distribution of parameters as well as missing data
P(seq | p , p |
2 |
, x) = p h1 ( x) (1 − p )t1 ( x) |
p |
h2 ( x) (1 − p |
2 |
)t2 ( x) |
|
1 |
1 |
1 |
|
2 |
|
||
x |
position right before the change |
|
|
||||
h (x) |
number of heads up to x. |
|
t1 (x) number of tails up to x |
||||
1 |
|
|
|
|
|
|
|
h (x) |
number of heads after x. |
|
t2 (x) number of tails after x |
||||
2 |
|
|
|
|
|
|
|
N |
total number of tosses |
In the Bayesian approach, we treat the parameters and the missing data on the same footing
posterior dis. |
P( p1, p2 , x | seq) |
= |
|
P(seq| p1 , p2 ,x)π ( p1 , p2 ,x) |
|
|||||
∑∫∫dp1dp2 P(seq| p1 |
, p2 ,x)π ( p1 , p2 ,x) |
|
||||||||
|
|
|
|
|
|
|||||
|
|
|
|
|
x |
|
|
|
|
uniform on x |
|
|
|
|
|
|
|
|
|
|
|
prior |
π ( p , p |
2 |
, x) = p α1 −1 |
(1 − p )β1 −1 p α 2 −1 |
(1 − p |
2 |
)β 2 −1 |
/(N + 1) |
||
|
1 |
1 |
|
1 |
2 |
|
|
|
posterior dis. P(x | seq) = ∫∫ dp1dp2 P( p1, p2 , x | seq) of x
P(x | seq) = |
Β(h1 (x) + α1, t1 (x) + β1 )Β(h2 (x) + α 2 , t2 (x) + β 2 ) |
∑Β(h1 (x) + α1, t1(x) + β1)Β(h2 (x) + α 2 , t2 (x) + β 2 ) |
|
|
x |
Problem sets
1. Suppose that x1, x2 ,..., xN are
independent and identically distributed (i.i.d) sample drawn from a normal distribution N (µ ,σ 2 )
a) show that the maximum likelihood estimate (MLE) for the mean and variance is given by the following formula
N |
|
µˆ = x = ∑ xi / N |
|
i=1 |
|
N |
|
σˆ 2 = ∑(xi − x)2 / N |
|
i=1 |
|
b) calculate the mean and variance of ˆ |
and σˆ 2 |
under repeated sampling. Show that the MLE converge |
|
to the true values with 1/ N error |
|
hint: recall from my second lecture that |
Nσˆ 2 / σ 2 has a |
chi-square distribution with (N-1) degrees of freedom
Problem sets
2.Maximum likelihood estimate for the parameters of multinomial distribution. Consider N independent trials, each can result in one of
T possible outcomes, with probabilities |
p1, p2 ,..., pT |
|
|
|
|
the observed numbers for the T possible outcomes are |
m1, m2 ,..., mT |
||||
calculate the MLEs for the probabilities. |
|
|
|
|
|
hint: write down the likelihood function |
v v |
T |
|
m |
|
|
|
|
|||
P(m | p) = const∏ pi |
i |
T |
|||
|
|
i=1 |
|
|
∑ pi = 1 |
and use Lagrangian multiplier to implement the constraint |
|||||
|
|
|
|
|
i=1 |
3.Entropic segmentation. Consider the example C in my previous lecture. use the some observed data. Suppose you know that the change point x is
between 12 and 16. Find x that minimize the entropy (maximizing the likelihood)
4.Bayesian inference for a Poisson process. Suppose that four
samples (1,1,0,3) are drawn from a Poisson distribution specified by a mean λ . Assuming a uniform prior distribution for λ .
calculate the posterior distribution of |
λ |
, |
λ |
|
ˆ that maximize the |
posterior distribution, and λ which is the average over posterior distribution.