Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Matta, Boyd. The quantum theory of atoms in molecules

.pdf
Скачиваний:
113
Добавлен:
08.01.2014
Размер:
11.89 Mб
Скачать

473

18

QTAIM in Drug Discovery and Protein Modeling

Nagamani Sukumar and Curt M. Breneman

18.1

QSAR and Drug Discovery

The introduction of a new drug to the market is often the culmination of a long and arduous process of laboratory experimentation, lead-compound discovery, animal testing and preclinical and clinical trials – a process which can typically take as long as 10–15 years from hit to lead to marketable drug. On average, 9 out of 10 promising leads fail, often at an advanced stage in the drug discovery pipeline, because of adverse ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. One of the most attractive strategies for streamlining and accelerating the process of drug discovery is virtual high-throughput screening (VHTS), employing quantitative structure–activity/property relationships (QSAR/QSPR) modeling. The goal of QSAR/QSPR is the development of correlations between molecular structure and pharmaceutical properties, thereby transforming the search for compounds with specific properties, by use of chemical intuition and experience, into a mathematically quantified and computerized form. When a correlation between structure and activity/property is found and validated, any number of compounds from large pharmaceutical databases, including those not yet synthesized, can be virtually screened on the computer to select structures with the desired properties. Virtual screening using ADMET filters can eliminate compounds likely to have adverse side-e ects, identifying the ‘‘losers’’ early in the process, to achieve the desired objective of ‘‘fail early, fail cheaply’’. The most promising compounds can then be chosen for laboratory synthesis and preclinical testing, thereby conserving resources and accelerating the process of drug discovery.

QSAR and QSPR have proved highly e ective within homologous sets of molecules, as is apparent from the extensive literature on the subject [1, 2]. Traditional QSAR methods have not, however, been as successful when applied to more structurally diverse sets of data. This di culty is partly because of the type of molecular property descriptors used and partly because of the complexity of chemistry space. Descriptors representing simple molecular properties were

474 18 QTAIM in Drug Discovery and Protein Modeling

often favored in early studies, because they seemed to provide intuitive insight into the physicochemical nature of the biological activity or chemical property under consideration. In recent years, descriptors that correlate with less clearly defined intermolecular interactions have often been found to lead to models with better predictive power [3–11]. Clark [12–14] has argued that the use of descriptors based on local properties calculated at the molecular van der Waals surface, and that do not encode the chemical constitution directly, are likely to provide more generalizable QSPR models that encourage sca old hopping between diverse regions of chemistry space.

Quantum-chemical descriptors are derived from actual molecular electron density distributions and are readily accessible via ab-initio or semi-empirical calculations. QSAR/QSPR models employing electron-density-derived descriptors are thus applicable to a wide variety of molecules and have the required flexibility to compute physical, chemical and biological properties. The primary disadvantage of such descriptors is the intensive computational e ort required to generate them via quantum-chemical calculations, precluding their routine use for large biological molecules or large pharmaceutical datasets. This drawback is circumvented in a QTAIM-based approach, described in Section 18.2. The atom typing scheme and generation of the transferable atom equivalent (TAE) library are outlined in Section 18.3, and TAE reconstruction and descriptor generation are dealt with in Section 18.4. Several families of QTAIM-based descriptors are presented in Section 18.5 and a few sample applications in Section 18.6.

18.2

Electron Density as the Basic Variable

In 1964, Hohenberg and Kohn [15] proved that the external potential vðrÞ is determined, within a trivial additive constant, by the distribution of electron density rðrÞ. Because rðrÞ determines the number of electrons:

ð

N ¼ rðrÞ dr ð1Þ

it follows that rðrÞ also uniquely determines the ground state wave function c, the ground state electronic energy, the molecular structure, and all the other electronic properties of the molecule. Thus Bader et al. [16–20] have shown that the topology of the gradient paths of the electron density, ‘rðrÞ, completely specifies the molecular graph. In 1981 Riess and Mu¨nch [21] extended the Hohenberg and Kohn theorem to subdomains of a bounded quantum system and showed that the ground-state density of an arbitrary subdomain uniquely determines the ground-state properties of this or any other domain. In fact, any nonzero volume part of the nondegenerate ground-state electron density contains all information about the molecule. This has been termed the holographic electron density theorem [22, 23]. Further, all information about latent molecular properties not

18.2 Electron Density as the Basic Variable 475

exhibited by a given molecular structure but exhibited by the same molecule in a di erent state or conformation is fully encoded in any nonzero volume of the nondegenerate ground state electron density. A latent property may be regarded as the response of a molecule to a specific interaction; this principle provides further theoretical justification for QSAR, because most biological activity depends not on the properties of isolated molecules in their equilibrium geometries but on the response of molecules to complex intermolecular interactions.

Although the electron density, in any finite region of space, encodes all molecular properties, in accordance with the Hohenberg–Kohn and Reiss–Mu¨nch theorems, the density itself is a rather insensitive function of the atomic and molecular environment, because of the ‘‘near-sightedness’’ of electronic matter [24–26]

– i.e. rðrÞ depends significantly only on the potential vðr0Þ at points r0 near r. The e ect on rðrÞ of changes of vðr0Þ at distant points r0 beyond a cut-o distance R ðEjr0j > RÞ decays monotonically as a function of R. To within an accuracy of dr the electron density rðrÞ cannot ‘‘see’’ any perturbation beyond the distance R. This ‘‘near-sightedness’’ is, in fact, what makes the study of chemistry more than just an encyclopedic catalog of properties of individual molecules and enables the approximate transferability of atomic and functional group properties from one molecule to another in a similar environment. Understanding the physics and chemistry of large molecules would have been impossible if not for the transferability principle. Because of the Reiss–Mu¨nch theorem, perfect transferability of an atom or functional group between molecules is an unachievable limit, although it can be approached very closely [27]. This approximate transferability of fragment properties lies at the heart of the chemical e ectiveness of the QTAIM, which is exploited in the transferable atom equivalent reconstruction (TAE RECON) method. The locality principle is also behind other local computational methods, for example the divide and conquer scheme [28, 29]. The relationship between the concepts of transferability and similarity has been discussed by Matta [30] in terms of the short-range nature of the reduced first-order density matrix.

In contrast with other fragment-based electron density reconstruction techniques [31–33], TAE descriptors encode the distributions of electron densitybased molecular properties, for example kinetic energy densities [18], local average ionization potentials [34], electrostatic potentials [35–38], Fukui functions [39–42], electron density gradients, and second derivatives or Laplacian distributions [18] (Section 18.3), rather than the density itself. The TAE RECON method is an attractive formulation for rapid molecular electron density reconstruction. QTAIMbased descriptors are capable of generating high-quality models when used with modern machine learning techniques, for example principal-component analysis, artificial neural networks [43–45], kernel partial least squares (KPLS), or support vector machine (SVM) regressions [46], with feature selection accomplished using genetic algorithms [47], sensitivity analysis [48], or a 1-norm linear support vector regression (SVR).

The TAE RECON method is based on the QTAIM [17, 18], wherein an atom in a molecule is defined as the union of an attractor (usually an atomic nucleus) and

476 18 QTAIM in Drug Discovery and Protein Modeling

its basin (the electron density distribution rðrÞ), bounded by an atomic surface of zero flux in the gradient of the electron density:

‘rðrÞ:nðrÞ ¼ 0; for all r belonging to the surface SðWÞ

ð2Þ

where nðrÞ is a unit vector normal to the surface. This is the boundary condition necessary for application of Schwinger’s principle of stationary action to an open system [49]. Atoms defined in this way have been shown to satisfy the virial theorem. An extensive body of work [16–19, 49–63] has revealed that virial partitioning gives a natural and rigorous meaning to the intuitive concept of an atom in a molecule, and convergence of the electrostatic interaction based on topological atoms has been computationally demonstrated [64–66]. Atoms defined in this way have uniquely identifiable properties that are approximately additive and transferable from one molecule to another. This transferability feature is really the basis of the TAE method [3, 65–70], because it enables transfer of atomic properties calculated using ab-initio methods in a small molecule to a much larger molecule containing the same (or very similar) type of atom. This naturally brings up the question of how to define a TAE atom type, which will be discussed in Section 18.3.

18.3

Atom Typing Scheme and Generation of the Transferable Atom Equivalent (TAE) Library

The quality of molecular TAE descriptors can only be as good as the atom-type representation in the TAE atom type library; it is, therefore, highly desirable to use a representation that reproduces the bonding environment of the atom being modeled as faithfully as possible. Any method that uses atom-based properties to construct and calculate molecular properties must have a consistent atom-typing scheme. For instance, Kier and Hall’s electrotopological state (E-state) atoms are selected on the basis of their element identity, valence state, and number of neighboring hydrogen atoms [71] whereas other methods select atom types based on their element type, valence, and connectivity [72]. The atom types in RECON are defined using several criteria (in descending order of priority):

1.element type or atomic number,

2.coordination number (number of other atoms connected to the atom in question),

3.atomic numbers and coordination numbers of the bonded neighbors,

4.size of the ring system, if any, containing the atom, and

5.next-nearest neighbors for mono-coordinate atoms.

RECON employs a sequential fallback procedure which uses the best available representation for each atom (closest match in the TAE library). A requested atom

18.3 Atom Typing Scheme and Generation of the Transferable Atom Equivalent (TAE) Library 477

type is compared with each atom type in the TAE library in succession – by string comparison to entries in a sorted TAE list file, as described in detail by Breneman and coworkers [3, 65, 66] – until the library atom type string with the best match is found. This atom type is then used to model the requested atom in the molecule. The TAE library in our present implementation of RECON [73] contains 915 atom types, but this number is capable of as much expansion as deemed necessary. In a virtual high-throughput application on a large pharmaceutical dataset, one would, in general, expect to find some atom types that are not as well represented in the library as others. The atom types in each molecule are categorized using the Atomtyper algorithm, which also identifies any new atom types encountered that are not satisfactorily represented in the TAE library and must be generated.

The TAE library of topological atomic charge density fragments is constructed in a form that enables rapid retrieval of the fragments and molecular assembly. Associated with each atomic charge density fragment in the TAE library are the coordinates of the bond critical points (BCP) of the atomic charge density (used to translate and reorient the atomic charge density fragments to the molecular coordinate system for molecular electron density reconstruction and visualization, if desired) and additive atomic charge density-based descriptors that encode electronic and structural information relevant to the chemistry of intermolecular interactions. These descriptors are described in detail in Section 18.5.

The generation of the TAE library starts with identification of atom types (using Atomtyper); this is followed by computation of ab-initio molecular wave functions (Hartree–Fock using the 6-31þG(d) theoretical model) [74]. Determination of the topology of the electron density and location of BCP (points along the bond paths connecting pairs of atoms where the electron density reaches its minimum) is performed using the SADDLE program [75]. The TAE atomic surfaces comprise interatomic surfaces defined by the zero-flux condition (2) and the external van der Waals surface of the atom, defined by the rðrÞ ¼ 0:002 e bohr 3 isosurface. Although the external surfaces of an atom in an isolated molecule in free space extend out to infinity, in a real interacting molecule a more meaningful boundary is the distance of nonbonded contacts or the van der Waals surface, which has been shown to correlate well with an electron density isosurface [76]. All TAE descriptors are computed on these rðrÞ ¼ 0:002 e bohr 3 isosurfaces. The interatomic surfaces are determined, in the PROAIM program [75], by generating a set of steepest descent paths in electron density radiating outward from the BCP. These zero-flux surfaces and the external van der Waals surfaces together form boundaries for integration of the electron-density-derived properties of each atom within a molecule. Electron-density-derived properties (Section 18.5) of the atomic fragments are then computed for each atom in the TAE library, and the van der Waals surface distributions of these properties are encoded in the form of histograms (Fig. 18.1). Finally, an index file listing all atom types in the TAE library is constructed in a form that enables rapid retrieval and atom type matching (as described above) [65, 66].

478 18 QTAIM in Drug Discovery and Protein Modeling

Fig. 18.1 (a) Politzer local average ionization potential PIPðrÞ-encoded van der Waals surface of Pyrilamine, (b) its histogram distribution, and

(c) its shape–property histogram distribution from PEST (Adapted from Ref. [132]). The z-axis in (b) and (c) is proportional to the surface area of the respective histogram bins.

18.4

TAE Reconstruction and Descriptor Generation

The molecular geometry and/or connectivity information for each input molecule is read at run-time by the RECON algorithm. Atomic connectivity, if not specified in the input, can be determined using a distance criterion – a table of standard single-bond distances is used for this purpose and any pair of atoms with a separation less than 110% of the corresponding single-bond distance are considered

18.4 TAE Reconstruction and Descriptor Generation 479

Fig. 18.2 TAE Reconstruction of thiophenol.

(a) TAE electron distribution in its native position. In the first step, the first TAE electron-density fragments are translated to the molecular coordinates of the model atom, as shown in (b). The charge-density

distribution is then rotated using a quaternion procedure; the results illustrated in (c). These steps are repeated for all atoms in the molecule until the entire molecular charge distribution is reconstructed, as shown in (d–f ) [65, 132].

bonded. The size of the ring (if any) that an atom in a molecule belongs to is determined by stepping along the connectivity tree. The current implementation of the RECON algorithm handles connectivity up to four, and rings of three, four, five, and six members are detected, as also are bridgehead atoms, each of which is represented by a distinct atom type.

When atom types and environments have been determined, the closest match is assigned for each atom in the input molecule from the precomputed TAE library of atom types. The densities of the atomic fragments can be combined, if desired, after translating the atomic electron density distributions into the molecular framework and rotating them into the proper orientation by matching the BCP as shown in Fig. 18.2 and described in detail by Whitehead et al. [65].

The molecular TAE descriptors are usually constructed by appropriate arithmetic operations on the respective atomic descriptors stored in the data files that constitute the TAE library. Thus the only computational operations involved in the generation of molecular TAE RECON descriptors are atom-type assignment and matching, then combination of molecular TAE descriptors. This makes the method highly suitable for VHTS applications, because it scales well with both molecular size and the size of the database – a database of 42,689 drug-like molecules from the NCI HIV database could be screened within 7.6 min. on a 1.7 GHz Intel Pentium with 529 KB RAM under the Mandrake Linux operating system (Table 18.1) [77]. The TAE descriptors are described in detail in Section 18.5.

48018 QTAIM in Drug Discovery and Protein Modeling

Table 18.1 CPU times in seconds for RECON running on an SGI 300 MHz MIPS R12000 processor with FPU 640 MB RAM and IRIX64 release 6.5 and on 1.7 GHz Intel Pentium with 529 KB RAM under the Mandrake Linux operating system [77].

Test dataset

Number of

File

SGI 300 MHz

 

 

 

 

molecules

format

Octane MIPS

 

1.7 GHz Intel

 

 

 

R12000 FPU; 640

 

Pentium Linux;

 

 

 

MB IRIX64

 

 

529 MB RAM

 

 

 

 

 

 

 

 

 

 

 

User

System

 

User

System

 

 

 

CPU (s)

CPU (s)

 

CPU (s)

CPU (s)

 

 

 

 

 

 

 

MAO inhibitors

1650

SDF

102.7

44.5

15.3

3.5

 

1641

SMILES

122.3

45.9

61.3

3.6

Proteins

25

PDB

186.8

194.5

65.1

17.6

NCI AIDS

42,689

SDF

2327.2

1131.0

391.0

67.5

 

 

 

 

 

 

 

 

18.5

QTAIM-based Descriptors

When all the atoms in a molecule have been typed, the molecular TAE descriptors are computed, usually by simple addition of the corresponding (precomputed) descriptors of the atomic fragments from the TAE library. Some electronic properties retrieved from the library after TAE reconstruction are listed in Table 18.2. These descriptors fall into four general classes:

traditional descriptors, for example molecular volume, surface area, and dipole moments, computed from the TAE;

topological descriptors, which depend only on the molecular connectivity, for example the molecular connectivity index ðw0Þ [78] and atom type counts;

electron-density-derived TAE surface descriptors – extrema, surface integral averages, and histogram bins are generated for each of the properties in Table 18.2; and

descriptors sensitive to the molecular coordinates and requiring a 3D structure for their evaluation, for example RECON autocorrelation descriptors (RAD), PEST shape– property hybrid descriptors [79], based on Zauhar’s shape signature ray-tracing scheme [80, 81], and an implementation of GETAWAY descriptors [8, 9] computed from the spatial coordinates of the atoms and based on a leverage matrix, the so-called molecular influence matrix.

18.5 QTAIM-based Descriptors 481

Table 18.2 Electron-density-derived properties after molecular reconstruction.

Integrated

Energy

 

Integrated electron population

 

Volume

 

Surface area

Topological

Molecular connectivity index ðw0Þ

 

Topological autocorrelations (TRAD)

Surface electronic properties – surface extrema, surface integral averages, histogram bins, wavelet coe cients derived from surface distributions, and autocorrelations based on atomic integral averages are available for each property

EP

Electrostatic potential

DRN

Electron density gradient normal to 0.002

 

e bohr 3 electron density isosurface

EPðrÞ ¼ Pa

Za

 

Ð

rðr 0 Þ dr 0

jr Ra j

jr r 0 j

‘r:n

G

Electronic gradient kinetic energy density

ð Þ ¼ ð

 

Þð

 

 

 

Þ

 

 

G r

1=2 ‘c :‘c

 

 

Þ

K

Electronic Schro¨dinger kinetic energy

ð Þ ¼ ð

1=2

Þð

 

 

þ

c‘2c

K r

 

c ‘2c

 

 

 

 

density

 

 

 

 

 

 

 

 

 

 

 

 

 

DGN

Gradient of the Schro¨dinger kinetic

‘K:n

 

 

 

 

 

 

 

 

 

 

 

energy density normal to surface

 

 

 

 

 

 

 

 

 

 

 

 

DGN

Gradient of the gradient kinetic energy

‘G:n

 

 

 

 

 

 

 

 

 

 

 

density normal to surface

 

 

 

 

 

 

 

 

 

 

 

 

F

Fukui F function scalar value

F ðrÞ ¼ h

qrðrÞ

iv ArHOMOðrÞ

 

qN

 

L

Laplacian of the electron density

LðrÞ ¼ 41 2rðrÞ ¼ KðrÞ GðrÞ

 

BNP

Bare nuclear potential

BNPðrÞ ¼ Pa

Za

 

 

 

 

jr Ra j

 

 

 

 

 

 

 

 

 

 

 

ri ðrÞjei j

 

 

 

 

PIP

Local average ionization potential

PIPðrÞ ¼ Pi

 

 

 

 

 

 

 

 

rðrÞ

 

 

 

 

Descriptors requiring 3D coordinates:

 

 

 

 

 

 

 

 

 

 

 

 

RAD

 

Recon autocorrelation descriptors for all TAE surface properties above

 

PEST

 

Shape–property hybrid descriptors for all TAE surface properties above

GETAWAY [8, 9]

Based on a leverage matrix – the molecular influence matrix

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The TAE volume and surface area descriptors have meanings similar to those of other volume and surface descriptors computed using most modern molecular modeling programs. Molecular volume is most often associated with hydrophobic e ects and tends to be correlated with the energy required to ‘‘dig a hole’’ in the solvent medium for the molecule. This is the sum of the energies required to break existing noncovalent interactions between solvent molecules, and the desolvation energies of the binding site with which the molecule might interact. In the case of solution binding and molecular recognition the desolvation energy of the solute molecule is also related to its volume. Other descriptors computed in

Соседние файлы в предмете Химия