Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

2

Atomistic Models and Force Fields

Alexander D. MacKerell, Jr.

University of Maryland, Baltimore, Maryland

I.INTRODUCTION

Central to the success of any computational approach to the study of chemical systems is the quality of the mathematical model used to calculate the energy of the system as a function of its structure. For smaller chemical systems studied in the gas phase, quantum mechanical (QM) approaches are appropriate. The success of these methods was emphasized by the selection of John A. Pople and Walter Kohn as winners of the 1998 Nobel prize in chemistry. These methods, however, are typically limited to systems of approximately 100 atoms or less, although approaches to treat large systems are under development [1]. Systems of biochemical or biophysical interest typically involve macromolecules that contain 1000–5000 or more atoms plus their condensed phase environment. This can lead to biochemical systems containing 20,000 atoms or more. In addition, the inherent dynamical nature of biochemicals and the mobility of their environments [2,3] require that large number of conformations, generated via various methods (see Chapters 3, 4, 6, and 10), be subjected to energy calculations. Thus, an energy function is required that allows for 106 or more energy calculations on systems containing on the order of 105 atoms.

Empirical energy functions can fulfill the demands required by computational studies of biochemical and biophysical systems. The mathematical equations in empirical energy functions include relatively simple terms to describe the physical interactions that dictate the structure and dynamic properties of biological molecules. In addition, empirical force fields use atomistic models, in which atoms are the smallest particles in the system rather than the electrons and nuclei used in quantum mechanics. These two simplifications allow for the computational speed required to perform the required number of energy calculations on biomolecules in their environments to be attained, and, more important, via the use of properly optimized parameters in the mathematical models the required chemical accuracy can be achieved. The use of empirical energy functions was initially applied to small organic molecules, where it was referred to as molecular mechanics [4], and more recently to biological systems [2,3].

7

8

MacKerell

II. POTENTIAL ENERGY FUNCTIONS

A.Potential Energy Functions for the Treatment of Biological Molecules

A potential energy function is a mathematical equation that allows for the potential energy, V, of a chemical system to be calculated as a function of its three-dimensional (3D) structure, R. The equation includes terms describing the various physical interactions that dictate the structure and properties of a chemical system. The total potential energy of a chemical system with a defined 3D structure, V(R)total, can be separated into terms for the internal, V(R)internal, and external, V(R)external, potential energy as described in the following equations.

V(R)total V(R)internal V(R)external

 

 

 

 

 

 

 

 

(1)

V(R)internal Kb(b b0)2

 

 

 

 

 

 

 

 

 

 

(2)

bonds

 

 

 

 

 

 

 

 

 

 

 

 

 

Kθ(θ θ0)2 Kχ[1 cos(nχ σ)]

 

angles

 

 

 

 

dihedrals

 

 

 

 

 

 

 

and

nonbonded

 

 

 

 

 

 

 

 

 

 

 

 

 

 

12

 

 

 

 

 

 

 

 

 

 

 

 

 

6

 

 

 

 

 

V(R)external

εij

Rmin,ij

 

 

Rmin,ij

 

 

qiqj

 

(3)

 

 

 

 

 

 

 

 

 

 

εDrij

 

 

rij

 

 

rij

 

 

 

atompairs

The internal terms are associated with covalently connected atoms, and the external terms represent the noncovalent or nonbonded interactions between atoms. The external terms are also referred to as interaction, nonbonded, or intermolecular terms.

Beyond the form of Eqs. (1)–(3), which is discussed below, it is important to emphasize the difference between the terms associated with the 3D structure, R, being subjected to the energy calculation and the parameters in the equations. The terms obtained from the 3D structure are the bond lengths, b; the valence angles, θ; the dihedral or torsion angles, χ; and the distances between the atoms, rij. A diagrammatic representation of two hypothetical molecules in Figure 1 allows for visualization of these terms. The values of these terms are typically obtained from experimental structures generated from X-ray crystallography or NMR experiments (see Chapter 13), from modeled structures (e.g., from homology modeling of a protein; see Chapters 14 and 15), or a structure generated during a molecular dynamics (MD) or Monte Carlo (MC) simulation. The remaining terms in Eqs. (2) and (3) are referred to as the parameters. These terms are associated with the particular type of atom and the types of atoms covalently bound to it. For example, the parameter q, the partial atomic charge, of a sodium cation is typically set to 1, while that of a chloride anion is set to 1. Another example is a CEC single bond versus a

 

 

˚

CCC double bond, where the former may have bond parameters of b0 1.53 A, Kb

˚ 2

˚

˚ 2

225 kcal/(mol A ) and the latter b0

1.33 A, Kb 500 kcal/(mol A ) Thus, different

parameters allow for different types of atoms and different molecular connectivities to be treated using the same form of Eqs. (2) and (3). Indeed, it is the quality of the parameters, as judged by their ability to reproduce experimentally, and quantum-mechanically determined target data (e.g., information on selected molecules that the parameters are adjusted to reproduce) that ultimately determines the accuracy of the results obtained from compu-

Atomistic Models and Force Fields

9

Figure 1 Hypothetical molecules to illustrate the energetic terms included in Eqs. (1)–(3). Molecule A comprises atoms 1–4, and molecule B comprises atom 5. Internal terms that occur in molecule A are the bonds, b, between atoms 1 and 2, 2 and 3, and 3 and 4; angles θ, involving atoms 1–2– 3 and atoms 2–3–4, and a dihedral or torsional angle, χ, described by atoms 1–2–3–4. Bonds can also be referred to as 1,2 atom pairs or 1,2 interactions; angles as 1,3 atom pairs or 1,3 interactions; and dihedrals as 1,4 atom pairs or 1,4 interactions. Molecule B is involved in external interactions with all four atoms in molecule A, where the different interatomic distances, rij , must be known. Note that external interactions (both van der Waals and Coulombic) can occur between the 1,2, 1,3, and 1,4 pairs in molecule A. However, external interactions involving 1,2 and 1,3 interactions are generally not included as part of the external energy (i.e., 1,2 and 1,3 exclusions), but 1,4 interactions are. Often the 1,4 external interaction energies are scaled (i.e., 1,4 scaling) to diminish the influence of these external interactions on geometries, vibrations, and conformational energetics. It should also be noted that additional atoms that could be present in molecule A would represent 1,5 interactions, 1,6 interactions, and so on, and would also interact with each other via the external terms.

tational studies of biological molecules. Details of the parameter optimization process are discussed below.

The mathematical form of Eqs. (2) and (3) represents a compromise between simplicity and chemical accuracy. Both the bond-stretching and angle-bending terms are treated harmonically, which effectively keeps the bonds and angles near their equilibrium values. Bond and angle parameters include b0 and θ0, the equilibrium bond length and equilibrium angle, respectively. Kb and Kθ are the force constants associated with the bond and angle terms, respectively. The use of harmonic terms is sufficient for the conditions under which biological computations are performed. Typically MD or MC simulations are performed in the vicinity of room temperature and in the absence of bond-breaking or bond-making events; because the bonds and angles stay close to their equilibrium values at room temperature, the harmonic energy surfaces accurately represent the local bond and angle distortions. It should be noted that the absence of bond breaking is essential for simulated annealing calculations performed at elevated temperatures (see Chapter 13). Dihedral or torsion angles represent the rotations that occur about a bond, leading to changes in the relative positions of atoms 1 and 4 as described in Figure 1. These terms are oscillatory in nature (e.g., rotation about the CEC bond in ethane changes the structure

10

MacKerell

from a low energy staggered conformation to a high energy eclipsed conformation, then back to a low energy staggered conformation, and so on), requiring the use of a sinusoidal function to accurately model them.

In Eq. (2), the dihedral term includes parameters for the force constant, Kχ; the periodicity or multiplicity, n; and the phase, δ. The magnitude of Kχ dictates the height of the barrier to rotation, such that Kχ associated with a double bond would be significantly larger that that for a single bond. The periodicity, n, indicates the number of cycles per 360° rotation about the dihedral. In the case of an sp3sp3 bond, as in ethane, n would equal 3, while the sp2sp2 CCC bond in ethylene would have n 2. The phase, δ, dictates the location of the maxima in the dihedral energy surface allowing for the location of the minima for a dihedral with n 2 to be shifted from 0° to 90° and so on. Typically, δ is equal to 0 or 180, although recent extensions allow any value from 0 to 360 to be assigned to δ [5]. Finally, each torsion angle in a molecule may be treated with a sum of dihedral terms that have different multiplicities, as well as force constants and phases [i.e., the peptide bond can be treated by a summation of 1-fold (n 1) and 2-fold (n 2) dihedral terms with the 2-fold term used to model the double-bonded character of the CEN bond and the 1-fold term used to model the energy difference between the cis and trans conformations]. The use of a summation of dihedral terms for a single torsion angle, a Fourier series, greatly enhances the flexibility of the dihedral term, allowing for more accurate reproduction of experimental and QM energetic target data.

Equation (3) describes the external or nonbond interaction terms. These terms may be considered the most important of the energy terms for computational studies of biological systems. This is because of the strong influence of the environment on the properties of macromolecules as well as the large number of nonbond interactions that occur in biological molecules themselves (e.g., hydrogen bonds between Watson–Crick base pairs in DNA, peptide bond–peptide bond hydrogen bonds involved in the secondary structures of proteins, and dispersion interactions between the aliphatic portions of lipids that occur in membranes). Interestingly, although the proper treatment of nonbond interactions is essential for successful biomolecular computations, it has been shown that the mathematical model required to treat these terms accurately can be relatively simple. Parameters associated with the external terms are the well depth, εij, between atoms i and j; the minimum interaction radius, Rmin,ij; and the partial atomic charge, qi. Also included is the dielectric constant, εD, which is generally treated as equal to 1, the permittivity of vacuum, although exceptions do exist (see below).

The term in square brackets in Eq. (3) is used to treat the van der Waals (VDW) interactions. The particular form in Eq. (3) is referred to as the Lennard-Jones (LJ) 6–12 term. The 1/r12 term represents the exchange repulsion between atoms associated with overlap of the electron clouds of the individual atoms (i.e., the Pauli exclusion principle). The strong distance dependence of the repulsion is indicated by the 12th power of this term. Representing London’s dispersion interactions or instantaneous dipole–induced dipole interactions is the 1/r6 term, which is negative, indicating its favorable nature. In the LJ 6-12 equation there are two parameters; The well depth, εij, indicates the magnitude of the favorable London’s dispersion interactions between two atoms i, j; and Rmin,ij is the distance between atoms i and j at which the minimum LJ interaction energy occurs and is related to the VDW radius of an atom. Typically, εij and Rmin,ij are not determined for every possible interaction pair, i, j; but rather εi and Rmin,i parameters are determined for the individual atom types (e.g., sp2 carbon versus sp3 carbon) and then combining rules are used to create the ij cross terms. These combining rules are generally quite

Atomistic Models and Force Fields

11

simple, being either the arithmetic mean [i.e., Rmin,ij (Rmin,i Rmin,j )/2] or the geometric mean [i.e., εij (εiεj )1/2]. The use of combining rules greatly simplifies the determination of the εi and Rmin,i parameters.

In special cases the use of combining rules can be supplemented by specific i,j LJ parameters, referred to as off-diagonal terms, to treat interactions between specific atom types that are poorly modeled by the use of combining rules. The final term contributing to the external interactions is the electrostatic or Coulombic term. This term involves the interaction between partial atomic charges, qi and qj, on atoms i and j divided by the distance, rij, between those atoms with the appropriate dielectric constant taken into account. The use of a charge representation for the individual atoms, or monopoles, effectively includes all higher order electronic interactions, such as those between dipoles and quadrupoles. Combined, the Lennard-Jones and Coulombic interactions have been shown to produce a very accurate representation of the interaction between molecules, including both the distance and angle dependencies of hydrogen bonds [6].

Once the 3D structure of a molecule and all the parameters required for the atomic and molecular connectivities are known, the energy of the system can be calculated via Eqs. (1)–(3). First derivatives of the energy with respect to position allow for determination of the forces acting on the atoms, information that is used in the energy minimization (see Chapter 4) or MD simulations (see Chapter 3). Second derivatives of the energy with respect to position can be used to calculate force constants acting on atoms, allowing the determination of vibrational spectra via normal mode analysis (see Chapter 8).

B.All-Atom Versus Extended-Atom Models

Always a limiting factor in computational studies of biological molecules is the ability to treat systems of adequate size for the required amount of simulation time or number of conformations to be sampled. One method to minimize the size of the system is to use extended-atom models versus all-atom models. In extended-atom models the hydrogens are not explicitly represented but rather are treated as part of the nonhydrogen atom to which they are covalently bound. For example, an all-atom model would treat a methyl group as four individual atoms (a carbon and three hydrogens), whereas in an extendedatom model the methyl group would be treated as a single atom, with the LJ parameters and charges adjusted to account for the omission of the hydrogens. Although this approach could be applied for all hydrogens it was typically used only for nonpolar (aliphatic and aromatic) hydrogens; polar hydrogens important for hydrogen bonding interactions were treated explicitly. Extended-atom models were most widely applied for the simulation of proteins in vacuum, where the large number of nonpolar hydrogens yields a significant decrease in the number of atoms compared to all-atom models. However, as more simulations were performed with explicit solvent representation, making the proportion of nonpolar hydrogens in the system much smaller, with ever-increasing computer resources the use of extended-atom models in simulations has decreased. Extended-atom models, however, are still useful for applications where a large sampling of conformational space is required [7].

C.Extensions of the Potential Energy Function

The potential energy function presented in Eqs. (2) and (3) represents the minimal mathematical model that can be used for computational studies of biological systems. Currently,