Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

1Foundation of Mathematical Biology / The Elements of Statistical Learning

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

287.66 Кб

Скачать

☆

<<< < Предыдущая 1 2 34 / 64 5 6 > Следующая >>>

Gene Harvesting

Hastie, Tibshirani, Botstein, Brown (2001). genomebiology.com/2001/2/1/research

First cluster genes using hierarchical clustering.

Obtain average expression proﬁles from all clusters. These serve as potential covariates, in addition to individual genes.

The use of clusters as covariates biases toward correlated sets of genes; reduces overﬁtting.

Forward stepwise algorithm; prescribed # terms.

Provision for interactions with included terms.

Model choice (# terms) via cross-validation.

5.5		6.0	6.5

Linkage Single

5.0	5.5	6.0	6.5	7.0	7.5	8.0

		7						Hierarchical
		7
	3	9					Average
							Average



	11

		2



		10					Linkage	Clustering





		1	5






		4

Kappa Opioid / Harvesting / Average Linkage

Step Node Parent Score Size

1	6295	0	22.40	687
2	1380	6295	19.67	6
3	663	0	15.62	2
4	3374	663	10.69	3
5	1702	0	12.92	2
6	6268	663	11.27	83

y = β + β x¯ + β (x¯ x¯

0 1 Node6295 2 Node1380 Node6295)+

Kappa Opioid / Harvesting / Single Linkage

Step	Node Parent Score Size

1	g3655	0	21.97	1
2	2050	g3655	20.62	3
3	g900	g3655	16.91	1
4	g1324	g3655	16.01	1
5	g1105	g3655	24.34	1
6	g230	g3655	12.44	1

y = β + β x + β (x¯ x

0 1 Gene3655 2 Node2050 Gene3655)+

Kappa Opioid: 5-fold CV Error Variance

	4*10^6
						Clustered Genes
	3*10^6					Original Genes
						Training Error

Residual Variance	2*10^6
	10^6
	0
	1	2	3	4	5	6	7
				Terms

Gene Harvesting: Kappa-Opioid

100
80
60
40
20
0
-0.4	-0.2	0.0	0.2	0.4	0.6	0.8	1.0

Correlations: Node 6295

Gene Harvesting: Kappa-Opioid

200
150					Node score = 22.4!
					Node score = 22.4!
100
50
0
0	2	4	6	8	10	12	14

Scores: Node 6295

Kappa Opioid: 10-fold CV Error Variance

500000

						Constrained Harvesting
						Constrained Harvesting
						Training Error

	400000
Variance	300000
Residual	200000
	100000
	0
	1	2	3	4	5	6
				Terms

Smoothing

Recall simple linear model: E(Y jX ) = β0 + β1X

Dependence of E(Y ) on X not necessarily linear.

Can extend model by adding terms, e.g., X 2

) problematic: what terms? when to add?

What is desirable is to have

1.the data dictate appropriate functional form without imposing rigid parametric assumptions,

2.a corresponding automated ﬁtting procedure.

Key concepts: locally determined ﬁt .

Issues: what is local? how to ﬁt?

Resultant methods: (scatterplot) smoothers.

Resultant model: E(Y jX ) = β0 + s(X ; λ)

log(PSA)	3
	2

								•			•
								•
		span = 10%									•
		span = 10%
		span = 25%							•
		span = 100%							•
		span = 100%					•
							•
•					•						•
•					•						•
•			•	•	•			•	•
•			• •		•				•		•
•			• •						•	•	•
•				•					•	•
			•			•			•
•	•	•	•			•	•
•	•	•					•	•	• •
•		•		•			•	•	• •
•	•	•	•	•				•			•
•			•					•
•					•
•		•	•		•			•	•
•								•	•
•	•
•	•		•					•
•	•		•					•
•	•					• •
•	•		•			• •
•	•	•	•
•	•	•
•	•

•

-1	0	1	2	3
		log (Capsular Penetration)

<<< < Предыдущая 1 2 34 / 64 5 6 > Следующая >>>

Соседние файлы в папке 1Foundation of Mathematical Biology

#
15.08.2013248.78 Кб46Foundation of Mathematical Biology Statistics Lecture 3-4.pdf
#
15.08.20132.11 Mб45Foundation of Mathematical Biology.pdf
#
15.08.2013287.66 Кб48The Elements of Statistical Learning.pdf