 Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

# 1Foundation of Mathematical Biology / Foundation of Mathematical Biology

.pdf
Скачиваний:
35
Добавлен:
15.08.2013
Размер:
2.11 Mб
Скачать UCSF

BP-203: Foundations of Mathematical Biology

Statistics Lecture I: October 23, 2001, 2pm

Instructor: Ajay N. Jain, PhD

Email: ajain@cc.ucsf.edu

Copyright © 2001, Ajay N. Jain

All Rights Reserved

 UCSF Introduction

Probability

Probability distributions underlie everything we measure and predict

Hao Li covered many aspects of probability theory: random variables, probability distributions (normal, Poisson, binomial…)

Statistics:

Statistics can be used to quantify the importance of measured effects

I will cover basic statistical methods

Good reference: Statistical Methods, Snedecor and Cochran (Eighth Edition)

 UCSF Lecture I

What is a statistic? How is it related to a probability distribution?

Frequency distributions

Mean and standard deviation: population vs. sample Example: Uniform distribution

Central Limit Theorem: The distribution of sample means and sample standard deviations

Confidence intervals Hypothesis testing

Common parametric statistics

 UCSF What is a statistic?

Statistics: techniques for collecting, analyzing, and drawing conclusions from data.

Probability theory is about populations We only know about samples

A statistic is a computable quantity based on some sample of a population from which we can make inferences or conclusions. UCSF

Frequency distributions, histograms, and cumulative histograms

A frequency distribution is a compact method to capture the characteristics of variation for a collection of samples.

Graphically, it can be represented

Histogram with fixed bin sizes

Cumulative histogram

Histogram from uniform distribution

Cumulative histogram

It is different from a probability distribution, which is generally not known UCSF

Mean and standard deviation for a discrete probability distribution versus a sample

Discrete probability distribution: mean and SD

 k k 2 µ = ∑Pj X j (X j − µ) σ = ∑Pj j=1 j=1

Sample of size n: sample mean and sample SD

 n n ∑Xi ∑(Xi − )2 ( X1 + X 2 + + X n ) X i 1 s = i=1 X = = = n −1 n n UCSF

From Hao Li’s lectures: The normal distribution

The normal distribution is the most important in statistics

 1 −( x−µ )2 f (x) = e (2σ 2 ) σ 2π General normal density Φ(z) = ∫z 1 −z2 e 2 −∞ 2π

Standard normal cumulative density function

 UCSF Why?

Many distributions naturally occuring are approximately normal. Any variable whose expression results from the additive effects of many small effects will tend to be normally distributed.

Often, for non-normal cases, simple transformation yields a normal distribution (e.g. square root, log)

The normal distribution has many convenient mathematical properties.

Even if the distribution in the original population is far from normal, the distribution of sample means tends to become normal under random sampling. UCSF Mean and standard deviation of sample mean

If we take repeated random samples of size n from any population (normal or not) with mean µ and standard deviation σ, the frequency distribution of the sample means has mean µ and standard deviation µ/sqrt(n)

Restated: the sample mean is an unbiased estimator of the population mean. Further, as n increases, the sample mean becomes a better estimator of the mean. UCSF The Central Limit Theorem

Previous slide was about the mean and SD of sample means. The CLT is about the distribution of the sample means.

If X is the sample mean of a population with mean µ and standard deviation σ, as n approaches infinity:

 −( x−µ )2 P(L1 < X < L2 ) = L2 1 e 2σ 2 dx n ∫L1 σ 2π n