1Foundation of Mathematical Biology / Foundation of Mathematical Biology
.pdfUCSF
BP-203: Foundations of Mathematical Biology
Statistics Lecture I: October 23, 2001, 2pm
Instructor: Ajay N. Jain, PhD
Email: ajain@cc.ucsf.edu
Copyright © 2001, Ajay N. Jain
All Rights Reserved
UCSF |
Introduction |
|
|
|
|
Probability
♦Probability distributions underlie everything we measure and predict
♦Hao Li covered many aspects of probability theory: random variables, probability distributions (normal, Poisson, binomial…)
Statistics:
♦Statistics can be used to quantify the importance of measured effects
♦I will cover basic statistical methods
♦Good reference: Statistical Methods, Snedecor and Cochran (Eighth Edition)
UCSF |
Lecture I |
|
|
|
|
What is a statistic? How is it related to a probability distribution?
Frequency distributions
Mean and standard deviation: population vs. sample Example: Uniform distribution
Central Limit Theorem: The distribution of sample means and sample standard deviations
Confidence intervals Hypothesis testing
Common parametric statistics
UCSF |
What is a statistic? |
|
|
|
|
Statistics: techniques for collecting, analyzing, and drawing conclusions from data.
Probability theory is about populations We only know about samples
A statistic is a computable quantity based on some sample of a population from which we can make inferences or conclusions.
UCSF
Frequency distributions, histograms, and cumulative histograms
A frequency distribution is a compact method to capture the characteristics of variation for a collection of samples.
Graphically, it can be represented
♦Histogram with fixed bin sizes
♦Cumulative histogram
Histogram from uniform distribution
Cumulative histogram
It is different from a probability distribution, which is generally not known
UCSF
Mean and standard deviation for a discrete probability distribution versus a sample
Discrete probability distribution: mean and SD
k |
k |
2 |
|
µ = ∑Pj X j |
(X j − µ) |
||
σ = ∑Pj |
|||
j=1 |
j=1 |
|
Sample of size n: sample mean and sample SD
|
|
|
|
|
n |
|
n |
|||
|
|
|
|
|
∑Xi |
|
∑(Xi − |
|
)2 |
|
|
|
|
( X1 + X 2 + + X n ) |
|
|
X |
||||
|
|
|
|
i 1 |
s = |
i=1 |
||||
|
|
|
|
|||||||
X = |
|
= |
= |
n −1 |
||||||
n |
n |
|||||||||
|
|
|
|
|
UCSF
From Hao Li’s lectures: The normal distribution
The normal distribution is the most important in statistics
|
|
1 |
|
−( x−µ )2 |
||
f (x) = |
|
e |
(2σ |
2 |
) |
|
σ |
2π |
|
||||
|
|
|
|
|
||
General normal density |
|
|
||||
Φ(z) = ∫z |
|
1 |
|
−z2 |
||
|
|
e 2 |
||||
|
|
−∞ |
2π |
|
||
|
|
|
|
|
Standard normal cumulative density function
UCSF |
Why? |
|
|
|
|
Many distributions naturally occuring are approximately normal. Any variable whose expression results from the additive effects of many small effects will tend to be normally distributed.
Often, for non-normal cases, simple transformation yields a normal distribution (e.g. square root, log)
The normal distribution has many convenient mathematical properties.
Even if the distribution in the original population is far from normal, the distribution of sample means tends to become normal under random sampling.
UCSF Mean and standard deviation of sample mean
If we take repeated random samples of size n from any population (normal or not) with mean µ and standard deviation σ, the frequency distribution of the sample means has mean µ and standard deviation µ/sqrt(n)
Restated: the sample mean is an unbiased estimator of the population mean. Further, as n increases, the sample mean becomes a better estimator of the mean.
UCSF |
The Central Limit Theorem |
|
|
|
|
Previous slide was about the mean and SD of sample means. The CLT is about the distribution of the sample means.
If X is the sample mean of a population with mean µ and standard deviation σ, as n approaches infinity:
|
|
|
|
|
|
−( x−µ )2 |
|
||
P(L1 |
< X < L2 ) = |
L2 |
|
1 |
e |
|
2σ 2 |
dx |
|
|
|
n |
|
||||||
∫L1 |
σ |
|
|
|
|
||||
|
|
|
2π |
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|