Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Matta, Boyd. The quantum theory of atoms in molecules

.pdf
Скачиваний:
113
Добавлен:
08.01.2014
Размер:
11.89 Mб
Скачать

482 18 QTAIM in Drug Discovery and Protein Modeling

RECON are the total TAE energy, the integrated electron population, and simple topological descriptors, for example the molecular connectivity index and atom type counts [78, 82–91].

18.5.1

TAE Descriptors

Among the TAE descriptors, the gradient of the electron density normal to the molecular surface ð‘r:nÞ, has been used to distinguish ‘‘soft’’ regions of polarizable electron density from more tightly held regions. For example, values of ‘r:n are much smaller over electron-rich systems and aromatic rings than over polarized or electron-deficient alkyl carbon atoms. Because rðrÞ decreases away from the attractors, ‘r:n is always negative; large negative values of ‘r:n indicate the electron density of the underlying molecular region is more tightly held and less likely to extend very far from the molecule.

Histogram bins of the electrostatic potential (EP), defined as:

ð

Þ ¼

a

r Ra

 

ð

r r0

 

ð

Þ

EP r

 

 

Za

 

 

rðr0Þ dr

0

3

 

 

X j

 

j

 

j j

 

 

 

 

 

 

 

 

its surface integral (SIEP), extrema, and integral average (SIEPIA), represent the scalar electrostatic potential values on the surface of the atom or molecule. Electrostatic potential has been implicated in many molecular and intermolecular phenomena, including acid–base interactions, solvation behavior, and pKa correlations [36, 37, 92]. These descriptors are often found in the best models of hydrogen-bonding systems and in regressions involving polar or dipolar molecules. Donor/acceptor behavior is also modeled well by use of these histogram descriptors. Whereas electropositive and electronegative regions of the molecular surface are represented by the high and low histogram bins of EP, respectively, the middle bins correspond to hydrophobic regions.

Another set of descriptors is derived from the bare nuclear potential (BNP), mapped on to the molecular electron density isosurface. BNP is simply the first term in Eq. (3). Because the geometry and orientation of the nuclei reflect the electron-density distribution, the strength of the BNP field mapped on to an electron density isosurface indicates the regions of imbalance between the nuclear–electron attractive forces and the interelectron repulsive forces. This provides information complementary to that obtained via the electrostatic potential.

The electronic gradient kinetic energy density distribution, G is defined as:

G r

Þ ¼

‘c :‘c

4

ð

 

ð Þ

The surface integral (SIG) of G can be interpreted as being associated with differences in donor/acceptor activity. The Schro¨dinger kinetic energy density, K, given by:

KðrÞ ¼ ðc ‘2c þ c‘2c Þ

ð5Þ

18.5 QTAIM-based Descriptors 483

is a rather smooth function over the surface of a typical molecule and is most negative in those portions of space where there is a local concentration of negative charge [18]. This also corresponds to areas of negative Laplacian, because imbalances of K and G are responsible for nonzero Laplacian values:

LðrÞ ¼

1

2rðrÞ ¼ KðrÞ GðrÞ

ð6Þ

4

The Laplacian ð‘2rÞ is the trace of the second derivative matrix of the electron density at any point in space and has been extensively studied by Bader et al. It has been implicated as a descriptor in the selectivity of electrophilic aromatic substitution and in donor–acceptor interactions. The truly indicative regions are the negative Laplacian peaks, which form near the outer core regions of the electron density of molecules. When these peaks are in regions of nonbonded electron density or in regions of electrophilic attack their magnitude tends to be useful for prediction of the rates of these reactions [18, 93]. Such ‘‘negative Laplacian peaks’’ are usually seen within 0.25–0.4 A˚ of an electron donor atom – well within the molecular van der Waals surface chosen for this analysis. ‘‘Shadows’’ of these internal extrema are, nevertheless, often present on the molecular surface. These ‘‘Laplacian shadows’’ on the molecular van der Waals surface are the best indicators of what is going on inside the molecular surface. These surface manifestations of the internal Laplacian peaks are often opposite in sign to that of the actual peak, as a result of Laplacian normalization. Consequently, slightly less negative regions of surface values of K often indicate the presence of Bro¨nsted bases [94].

The rate of change of the K electronic kinetic energy density normal to and away from the molecular surface ð‘K:nÞ has been shown to describe di erences between the polarizability and hydrophobicity of molecular regions [94]. More negative ranges of this function indicate that the region is more hydrophobic and less susceptible to electrophilic attack [18, 59, 95]. Likewise, ‘G:n and ‘r:n are often significant in correlation models of dispersion interactions [94].

One of the most interesting and underexploited descriptors is the local average ionization potential, called ‘‘I-bar’’, of the GIPF parameter set of Politzer et al. [34, 96], referred to here as the Politzer ionization potential (PIP):

PIP

ð

r

Þ ¼

Xi

riðrÞjeij

7

 

 

rðrÞ

ð Þ

PIP histogram descriptors appear in many diverse models of disparate phenomena. PIP is correlated with several intermolecular binding modes, for example induced-dipole interaction. PIP descriptors also carry information about the ‘‘hardness’’ or ‘‘softness’’ of a region of electron density, and donor–acceptor information. Quite frequently, PIP descriptors occur, with ‘r:n and SIK, in models describing di erential solubility or hydrophobic–hydrophilic interaction tendency.

484 18 QTAIM in Drug Discovery and Protein Modeling

Another class of molecular and regional descriptors is derived from the Fukui radical reactivity indices ðFÞ, defined as:

ð

Þ ¼

qN

A 2

ð

LUMOð

Þ þ

 

HOMOð

ÞÞ

ð Þ

F r

 

qrðrÞ

1

r

r

 

r

r

 

8

 

 

 

 

 

v

The Fukui index defined above describes radical reactivity. Similar Fukui indices:

F ArHOMOðrÞ

ð9Þ

and

 

Fþ ArLUMOðrÞ

ð10Þ

describe reactivity toward electrophilic and nucleophilic attack, respectively. The Fukui indices are somewhat related to PIP, in that both involve a perturbation expression which is meant to describe the spatial distribution of radical reactivity. For PIP the molecular surface is encoded with energy-weighted orbital densities whereas for F there is a selectable denominator term which places the reactivity index on a cationic, radical, or anionic scale.

Other electron-density-derived descriptors used in the literature [13, 14] include the local electron a nity:

EAðrÞ ¼ PNLUMO riðrÞeiðrÞ

ð11Þ

 

N

 

 

 

 

 

 

Pi¼LUMO riðrÞ

 

the electronegativity:

 

 

 

 

 

wðrÞ ¼ ðPIP þ EAÞ=2

 

 

ð12Þ

the local hardness:

 

 

 

 

 

hðrÞ ¼ ðPIP EAÞ=2

 

 

ð13Þ

and the local polarizability:

 

 

 

aðrÞ ¼ PN

ð Þ

 

ð

Þ

ð14Þ

 

N 1 ri0

r

qiai

r

 

 

P

i¼1 ri0ðrÞqi

defined within the framework of semi-empirical MO theory. Ehresmann [13, 14] found that the local electron a nity, local hardness, and local polarizability had little correlation with other descriptors in common use and these descriptors e ectively extend the variance of the descriptor set. Use of local electron-density- derived descriptors potentially leads to an increased likelihood of sca old hopping

18.5 QTAIM-based Descriptors 485

(i.e. switching from one structural type to another) in QSAR and virtual screening applications and to more robust and general QSPR models.

Encoding of surface property distributions may be accomplished by use of multiple methods, and use of alternatives to histogram-based representations have often proved useful. One such alternative scheme involves the use of wavelet coe cients to capture TAE-encoded surface-property distributions [97, 98]. In recent years, wavelet encoding has gained popularity in diverse applications as an e cient means of data compression and pattern recognition [97]. The wavelet basis has advantages over the Fourier basis in that, although the trigonometric functions used in Fourier expansion are monochromatic in frequency but entirely delocalized in position, the wavelet basis is well localized in both frequency and position. Wavelet encoding and decoding can be accomplished by use of a simple scaling and dilation algorithm. Wavelet encoding enables a more compact representation of molecular surface property distributions than use of histogram descriptors. Our implementation of TAE property modeling with wavelet coe cient descriptors (WCD) has been described by Sundling et al. [97, 98]. Molecular WCDs can be reconstructed additively from the constituent atomic WCDs.

18.5.2

RECON Autocorrelation Descriptors

RECON autocorrelation descriptors are patterned after whole surface autocorrelation descriptors [79] and are a computationally inexpensive way of including 3D shape information within the TAE RECON formalism. RADs use integrated TAE surface properties ðPxÞ to calculate property autocorrelations using Gasteiger’s formula [99]:

1

X

 

 

 

 

 

AðRxyÞ ¼ n x; y

Px Py

ð15Þ

The autocorrelation function is then binned by the distance ðRxyÞ between atoms x and y; n is the number of atomic pairs. The electron-density-derived TAE properties are integrated over the external atomic surfaces to compute RAD. Generation of RAD data involves a mere 3–5% CPU overhead over the computation of 2D TAE descriptors for drug-sized molecules. Where 3D structures are not available, the topological distances between pairs of atoms may be used for binning the autocorrelation function, to yield conformation-insensitive 2D RAD or TRAD (topological RECON autocorrelation descriptors).

18.5.3

PEST Shape–Property Hybrid Descriptors

TAE descriptors can be supplemented, with some increase in computation time, by hybrid shape–property descriptors that encode detailed information about molecular shape without requiring an alignment procedure. The supplemental infor-

486 18 QTAIM in Drug Discovery and Protein Modeling

Fig. 18.3 (a) Politzer local average ionization potential PIPðrÞ, (b) kinetic energy density KðrÞ, and (c) Laplacian distribution ‘2rðrÞ of benzene reconstructed from atomic densities. The seams show the zero-flux surfaces ‘r:n ¼ 0. Shown alongside are the corresponding PEST ray-trace descriptors [79].

mation available from these descriptors is useful where the shape of the molecule plays a determining role in binding. Property-encoded surface translator (PEST) descriptors [79] may be computed using atomic fragment-based TAE RECON property-encoded surface reconstructions, as shown in Fig. 18.3 (or using abinitio or semi-empirical electron-density surfaces and electronic properties). The Zauhar shape signature ray-tracing scheme [80, 81], upon which PEST descriptors are based, seeks to encode the shape of the molecular volume by using the distribution of ray lengths obtained by performing a ray-tracing procedure within the molecular van der Waals envelope, beginning from an arbitrary starting position. The converged ray-length distribution then represents a distinctive ‘‘shape signature’’ of the molecule. PEST records the ray lengths and TAE properties at each point where the rays encounter the molecular surface, to generate twodimensional hybrid shape–property histogram descriptors, as shown in Fig. 18.1c. The algorithm for combining the densities of the TAE fragments, after translating them into the molecular framework and rotating them into the proper orientation by matching up BCP, is described by Whitehead et al. [65] and the ray-tracing and descriptor computation algorithms are described by Breneman et al. [79]. Inclusion of PEST hybrid shape–property descriptors with 2D topolog-

18.5 QTAIM-based Descriptors 487

ical descriptors increases the predictive capability of QSAR and QSPR models. The information content of PEST shape–property descriptors has been shown to be comparable with or greater than 3D field-based methods such as CoMFA [100], with the considerable advantage of not requiring an explicit alignment rule.

18.5.4

Electron Density-based Molecular Similarity Analysis

Electron density functions have been used to develop the idea of molecular quantum similarity measures (MQSM) for rational drug design and have been extensively applied to pharmacological and toxicological problems [101–115]. MQSM encapsulates the principle that the more similar two molecules are, the more similar will their properties be [101, 116, 117]. The degree of similarity can be established on the basis of the electronic distribution, the topology of the BCP [111–115], or any electron-density-derived property. Vercauteren et al. [118] have developed a procedure for similarity searching of molecules on the basis of comparison of critical point representations of 3D electron-density maps. Pair-wise and multiple comparisons between the molecular critical point graphs are performed using a Monte Carlo/simulated annealing technique. The method has been used for similarity searching of pharmaceutical ligands at di erent levels of crystallographic resolution.

Girones et al. [119] have developed a kinetic energy density-based molecular quantum similarity measure (KE MQSM) for two quantum objects:

ð

ZAB ¼ KAðrÞKBðrÞ dr ð16Þ

which can be used to construct a Carbo index of similarity:

CAB ¼ ZABðZAAZBBÞ 1=2

ð17Þ

As a means of interpreting and visualizing molecular structure, Ponec [109, 110] introduced the so-called domain-averaged Fermi hole:

gW Aðr1Þ ¼ NW rAðr1Þ 2 ðW rAðr1; r2Þ dr2

ð18Þ

where

 

NW ¼ ðW rAðrÞ dr

ð19Þ

is the mean number of electrons in the domain W and rAðr1; r2Þ is the pair density. Girones and Ponec [120] used the domain-averaged Fermi hole density to define the fragment molecular quantum self-similarity measure for fragment A:

488 18 QTAIM in Drug Discovery and Protein Modeling

 

ZAAW ¼ ðW gWAðrÞgWAðrÞ dr

ð20Þ

Similarity indices based on the density can give exaggerated weight to small mismatches in regions of space with high electron density, e.g. in the vicinity of nuclei, when comparing molecules with slightly di erent geometry. To provide a more balanced measure of similarity that reflects the reactivity of the molecule without being biased by small nuclear cusp mismatches, Matta [30] proposed the use of the integral of the Laplacian to define a ‘‘reactivity’’ similarity index for a pharmacophore:

RW W W0

 

 

Ð2rðrÞb0

0

 

0

 

21

 

 

 

 

WWW

2rW

2rW

 

dv

 

 

¼

r r

ð Þ

 

Ð

ð Þ

 

Ð

ð Þ

 

 

 

 

W

2rW0 dv

W0

 

2rW dv

 

 

 

2r r b0

 

2r b0

 

A recent application of TAE fragments for similarity searching consists of using the statistics of TAE atom type fragments, clustered in accordance with the priority scheme described in Section 18.3, to sample neighborhoods in molecularproperty space and to assess the predictivity of models constructed using other descriptors. This assessment is then used to supplement a training set with more molecules in regions of molecular-property space that are poorly represented in the training set. Application to the design of novel selective displacers for protein chromatography has been discussed [121].

In collaboration with our group, Olo , et al. [122] have developed a novel structure-based cheminformatics approach (CoLiBRI) using TAE RECON descriptors to search for complementary ligands, based on representation of both receptor-binding sites and their respective ligands in a space of universal chemical descriptors. The binding site atoms involved in the interaction with ligands were identified by applying Delaunay tessellation to the X-ray structures of the ligand–receptor complexes. TAE RECON descriptors were calculated independently for each ligand and for its active-site atoms. This representation of both ligands and active sites using the same set of chemical descriptors enables correlation of chemical similarities between active sites and their respective ligands to be elucidated. A procedure for mapping patterns of nearest-neighbor active site vectors in a TAE RECON space on to those of their complementary ligands enables prediction of a virtual complementary ligand vector in the ligand chemical space from the position of a known active site vector; this is followed by estimation of chemical similarity of the virtual ligand vector and molecules in a chemical database, to identify real compounds most similar to the virtual ligand. Thus, knowledge of the structure of the receptor active site enables identification of its complementary ligands in large databases of chemical compounds by use of rapid chemical similarity searches. Conversely, starting from the chemical structure of the ligand one may identify possible complementary receptor cavities. Applied to a data set of 800 X-ray characterized ligand–receptor complexes, knowledge of the active site structure enabled identification of its complementary ligand among the

18.6 Sample Applications 489

top 1% of a large chemical database in over 90% of all test active sites when a binding site of the same protein family was present in the training set. When test receptors were highly dissimilar and not present among the receptor families in the training set, CoLiBRI was still able to quickly eliminate 75% of the chemical database as improbable ligands.

18.6

Sample Applications

18.6.1

QSAR/QSPR with TAE Descriptors

Several illustrative applications of QSAR/QSPR modeling with TAE descriptors are shown in Figs 18.4–18.7. Figure 18.4 shows the results of modeling the acute toxicity of organic compounds in fish using RECON descriptors and KPLS models constructed using Analyze [123]. The results are averaged over 100 bootstraps. The dataset [124] comprises 375 molecules, 300 of which are used for training and 75 for testing the model predictions. Predictions on the test set have a

Fig. 18.4 Results from KPLS models for the acute toxicity of 375 organic compounds to fathead minnows [124], constructed with Analyze [123] and RECON descriptors, averaged over 100 bootstraps. R2 ¼ 0:86 for the training set (300 molecules); 0.81 for the test set (75 molecules); leave one out (LOO) Q2 ¼ 0:76.

490 18 QTAIM in Drug Discovery and Protein Modeling

Fig. 18.5 (a) Bagged SVM model for Caco-2 permeability with RECON, MOE, and PEST descriptors using fifteen features. (b) Star plot showing descriptor importance in 20 SVM bootstraps for Caco-2 permeability.

The eight descriptors on the left are negatively weighted; the seven on the right are positively weighted; each ray represents a separate bootstrap; the radius of each ray represents the weight or importance of that descriptor in that bootstrap [133].

18.6 Sample Applications 491

Fig. 18.6 Prediction of the glass transition temperatures of polymers [126] from the Bicerano data set [127] of 300 polymers, 173 in the training set and 127 in the test set. Kernel PLS modeling using TAE/ RECON descriptors from repeat unit end-capped with two monomer units, five latent variables. Test set Q2 ¼ 0:928.

Q 2 statistic of 0.81. In this case addition of 2D descriptors does not improve the models.

Figure 18.5 shows the results from a bagged SVM model for Caco-2 permeability with RECON, MOE, and PEST descriptors using fifteen features. Descriptor importance in 20 SVM bootstraps is depicted in the star plot in Fig. 18.5b, in which each ray represents a separate bootstrap and the radius of each ray represents the weight or importance of that descriptor in that bootstrap.

TAE RECON descriptors have also been used for the prediction of polymer properties [125, 126]. Figure 18.6 shows the results from KPLS modeling [126] of the glass transition temperatures of 300 polymers from the Bicerano data set [127], using TAE RECON descriptors of the repeat unit end-capped with two monomer units. One-hundred and seventy-three polymers were used in the training set and 127 in the test set. This model yielded excellent prediction results, as is apparent from the score of Q 2 ¼ 0:928 for the test set.

18.6.2

Protein Modeling with TAE Descriptors

Although a limited set of ab-initio computations of some small proteins have recently been reported, routine ab-initio computations on proteins within a reasonable time are not currently feasible. This is where the advantages of the TAE reconstruction method are evident. Two-dimensional TAE descriptors for proteins based on the primary structure, i.e. the amino acid sequence, can be computed very rapidly with RECON, using the amino acid residues as the TAE fragments. Such studies have been performed and used to model the retention times of

Соседние файлы в предмете Химия