Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

1Foundation of Mathematical Biology / Foundation of Mathematical Biology

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

2.11 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 56 / 106 7 8 9 10 > Следующая >>>

UCSF		Molecular similarity: 3D

3D similarity

♦Surface-based comparison approach

♦Requires dealing with molecular flexibility and alignment

♦Much slower, but fast enough for practical use

What is the algorithm?

♦Take a sampling of the conformations of molecules A and B

♦For each conformation, optimize the conformation and alignment of the other molecule to maximize S

♦Report the average S for all optimizations

Key issues: not number of atoms. Number of rotatable bonds, alignment

N	N	N	N
	N

		O
N	N	N	N
N		N	N
1.00	0.97	00..9393	00..9191
N	N	N	N
N
N	N		N
N
	N	N	O
	N

0.90	0.89		00..8888	00..8787
N	N		N	O
			N

			O	O
N	N		O
N	N	O	N	N+
		O

0.87

0.83

00..8282

00..6363

UCSF		Distributions for the two methods are very different:
UCSF		What are the quantitative overlaps?
		Molecule pairs observed crystallographically to bind the sameme sitessites

		Molecule pairs observed crystallographically to bind differentent sitessites

		Morphological Similarity			Daylight Tanimoto Similarity


(Probability distribution and integration)			(Probability distribution and integration)
	Morphological similarity			Topological similarityy

The unrelated pairs distributions are nearly normal

The related pairs distributions are multi-modal, possibly a mixture of normals

UCSF		Receiver operator characteristic curves (ROC curves)
UCSF		plot the relationship of TP rate and FP rate

To construct a ROC curve:

♦Vary the similarity threshold over all observed values

♦At each threshold, compute the proportion of true positives and the proportion of false positives

♦At low thresholds, we should have high FP, but perfect TP

♦At high thresholds, we should have low FP, but poorer TP

At a false positive rate of 0.05, MS yields a 47% reduction in the number of related pairs that are lost

At a true positive rate of 0.70, MS yields a 7-fold better elimination of false positives

Morphological Similarity

Daylight Tanimoto Similarity

True positive rate

False positive rate

UCSF		Paired data

Spearman’s rank correlation test

♦We have (Xi,Yi) for n samples

♦We want to know if there is a relationship between the paired samples, but we don’t know if it should be linear, so we need an alternative to Pearson’s r

♦Replace the (Xi,Yi) with (Rank(Xi),Rank(Yi))

♦Compute Pearson’s r for the new values

♦Alternative formulation, where d = difference in ranks for each

data pair		∑d 2
	r =1−	∑d 2
	r =1−
	s	n(n2	−1)
		n(n2	−1)

UCSF		Example: Spearman’s rank correlation

UCSF		Paired data

Kendall’s Tau: another rank correlation test

♦We have (Xi,Yi) for n samples

♦Definition

•Look at all pairs (Xi,Xj) and corresponding (Yi,Yj)

•Score a 1 for a concordant event

•Score a -1 for a discordant event

•Score 0 for ties in values

•Normalize result based on the number of comparisons

•We get a statistic from -1 to 1

♦Kendall’s Tau has a slightly nicer frequency distribution

♦It can be less sensitive to single outliers

UCSF

Codelet to compute Kendall’s Tau (generalized for real-valued ties)

double k_tau(double *actual, double *predicted, int n, double delta1, double delta2)

{

long int i,j;

double total = 0.0, compare = 0.0;

for (i = 0; i < n; ++i) { for (j = i+1; j < n; ++j) {

compare += 1.0;

/* first check if either is equal --> get no benefit */ if (fabs(actual[i]-actual[j]) <= delta1) {

continue;

}

if (fabs(predicted[i]-predicted[j]) <= delta2) { continue;

}

/* now check if they are correct or incorrect */

if ((actual[i] > actual[j]) && (predicted[i] > predicted[j])) total += 1.0;

else if ((actual[i] < actual[j]) && (predicted[i] < predicted[j])) total += 1.0;

else total += -1.0; /* we have a missed rank match */

}

if (compare == 0.0) return(0.0); return(total/compare);

}

UCSF		Paired data

Signed rank test (Wilcoxon)

♦We have (Xi,Yi) for n samples

♦Definition

•Compute all differences (Xi-Yi)

•Sort them, low to high, based on absolute value

•Assign ranks to each

•Multiple each rank associated with a negative difference by -1

•Sum the negative ranks and positive ranks

•Take the smaller magnitude sum: This is your statistic

♦Again, tables are available for small n

♦An approximation is available for large n

UCSF		Conclusions: Non-parametric statistics

Non-parametric statistics reduce reliance on distributional assumptions about your data

♦They often give very sensitive tests

♦Generally though, the corresponding parametric tests are more sensitive when it their assumptions hold

♦Note that the process is generally the same

•Compute your statistic

•Look up a significance value or compute one from an approximation

Resampling and permutation-based methods move toward deriving everything from the data observed

UCSF

BP-203: Foundations of Mathematical Biology

Statistics Lecture III: October 30, 2001, 2pm

Instructor: Ajay N. Jain, PhD

Email: ajain@cc.ucsf.edu

<<< < Предыдущая 1 2 3 4 56 / 106 7 8 9 10 > Следующая >>>

Соседние файлы в папке 1Foundation of Mathematical Biology

#
15.08.2013248.78 Кб46Foundation of Mathematical Biology Statistics Lecture 3-4.pdf
#
15.08.20132.11 Mб45Foundation of Mathematical Biology.pdf
#
15.08.2013287.66 Кб48The Elements of Statistical Learning.pdf