1Foundation of Mathematical Biology / The Elements of Statistical Learning

Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

If yˆi is from a linear ﬁt with

p covariates

Cov(yˆi; yi) = pσε2

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

287.66 Кб

Скачать

☆

<<< < Предыдущая 1 23 / 63 4 5 6 > Следующая >>>

Bias, Variance, Complexity ctd

Expected prediction error of ﬁt fˆ(X ) at input point

X = x0 under squared error loss:

Err(x0) = E[(Y fˆ(x0))2jX = x0]

=σ2ε + [E fˆ(x0) f (x0)]2 + E[ fˆ(x0) E fˆ(x0)]2

=σ2ε + Bias2( fˆ(x0)) + Var( fˆ(x0))

=Irreducible Error + Bias2 + Variance:

First term: variance of the outcome around its true mean f (x0); unavoidable.

Second term: squared bias – amount by which average of estimate fˆ(x0) differs from true mean. Third term: variance – expected squared deviation of estimate around its mean.

Bias, Variance, Complexity ctd

For a linear model LS ﬁt			fˆp(x) = βˆ T x we have
Err(x ) = E[(Y fˆ		(x ))2jX = x ]
0	p	0	0
= σ2	+ [E fˆp(x0) f (x0)]2 + jjh(x0)jj2σ2:
ε			ε

Here h(x0) is the weight vector producing the ﬁt:

fˆp(x0) = x0T (XT X) 1XT y.

So, Var[ fˆp(x0)] = jjh(x0)jj2σ2ε.

While this variance changes with x0, its average (over sample values xi) is (p=N)σ2ε.

Hence, in-sample error is

1	N	1	N	p
	∑ Err(xi) = σε2 +		∑[E fˆp(xi) f (xi)]2 +		σε2:
N		N
	i=1		i=1	N

Here model complexity directly related to the number of parameters p – will generalize later.

Bias, Variance, Complexity ctd

Ridge regression has identical form for test error. But weights in variance term are different:

h(x0) = x0T (XT X + λI) 1XT . Bias also different.

Consider a linear model family (incl ridge regn):

β parameters of best ﬁtting linear approx to f :

β = arg minβ EX ( f (X ) βT X )2: Squared bias is

[ f (x0) E fˆλ(x0)]2 =

[ f (x ) βT x 2 + [βT x EβT x 2

0 0] 0 λ ( 0)] :

First term: model bias – error between best ﬁtting linear approx and true function.

Second term: estimation bias – error between the average estimate (EβTλ (x0)) and best linear approx.

Bias, Variance, Complexity ctd

For linear models, ﬁt by LS, estimation bias = 0. For restricted ﬁts (e.g., ridge) it is positive

– but have reduced variance.

Model bias can only be reduced by enlarging the class of linear models to a richer collection of models. Can be accomplished by inclusion of interaction terms or covariate transformations (e.g., SVMs, additive models –later).

Optimism of Training Error Secn 7.4

Training error typically less than true error.

Deﬁne the optimism as op Errin E(err):

For squared error and other loss functions have

2		N
op =	N	∑	Cov(yˆ	; y )
op =			Cov(yˆ	; y )
			i	i
		i=1

) the amount by which err underestimates the true error depends on how strongly yi affects its own prediction. The harder we ﬁt the data, the greater Cov(yˆi; yi), thus increasing the optimism.

Estimation of Prediction Error Secn 7.5

General form of in-sample estimates is

Errin = err +op:

Applying to linear model with p parameters ﬁt under squared error loss gives the Cp statistic:

	=		+		p	ˆ 2
Cp		err		2
					N
						σε:

Here σˆ 2ε is an estimate of the error variance obtained from a low-bias (large) model. Under this criterion we adjust the training error by a factor proportional to the number of covariates used.

Akaike Information Criterion is a generalization to situation where a log-likelihood loss function is used, e.g., binary, Poisson regression.

Criterion Selection Functions

Generic form for AIC is

AIC = 2 loglik +2 p

Bayes information criterion (BIC) (Secn 7.7) is

BIC = 2 loglik +log N p

For N > e2 7:4, BIC penalty > AIC penalty

) BIC favors simpler models.

Many variants; new feature – adaptive penalties.

When log-lik based on normal distn we require an estimate for σ2ε. Typically obtained as mean squared error of low-bias model ) problematic. Cross-validation does not require this.

Effective Number of Parameters Secn 7.6

The Cp or AIC criteria have an optimism estimate (penalty) that involves number of parameters p.

If covariates are selected adaptively then no longer have Cov(yˆi;yi) = pσ2ε; e.g., total of p covariates and we select the best-ﬁtting model with q < p covariates, optimisim will exceed (2q=N)σ2ε.

By choosing best-ﬁtting model with q covariates, the effective number of parameters is > q.

Linear ﬁtting methods: yˆ = Sy where S is N N matrix depending only on covariates xi (not yi). Includes regression, methods using quadratic penalties such as ridge, cubic smoothing splines. Deﬁne enp as d(S) = trace(S).

Cross-Validation Secn 7.10

Simplest method for estimating prediction error.

Estimates extra-sample error Err = E[L(Y; fˆ(X )].

With enough data (large N) set aside portion as validation set. Use to assess model performance.

Not feasible with small N ) CV offers a ﬁnesse.

Randomly partiton data into K equal-sized parts.

For kth part, ﬁt model to other K 1 parts. Then calculate prediction error of resultant model when applied to kth part. Do this for k = 1; : : : ; K and combine the prediction error estimates.

Let κ : f1; : : : ; Ng 7!1; : : : ; K map observations to their assigned partition. Let fˆ k(x) denote ﬁtted

function with kth part removed.

κ(i)(xi; α)):

Cross-Validation ctd

Then CV prediction error estimate is

1		N
		∑		ˆ	κ(i)(
CV =			L(y	; f		x )):
	N		i			i
		i=1

Given a set of models f (x; α) indexed by tuning parameter α (e.g., ridge, lasso, subset, spline) set

1 N

CV(α) = N ∑ L(yi; fˆ

i=1

Find αˆ minimizing CV(α) and ﬁt chosen model f (x; αˆ ) to all the data.

K = N: leave-one-out CV – approx unbiased for true prediction error but can be highly variable.

K = 5: lower variance but bias can be a problem.

Generally K = 5 or 10 recommended but clearly depends on N ) microarray applications??

<<< < Предыдущая 1 23 / 63 4 5 6 > Следующая >>>

Соседние файлы в папке 1Foundation of Mathematical Biology

#
15.08.2013248.78 Кб46Foundation of Mathematical Biology Statistics Lecture 3-4.pdf
#
15.08.20132.11 Mб45Foundation of Mathematical Biology.pdf
#
15.08.2013287.66 Кб48The Elements of Statistical Learning.pdf