Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Conformational Analysis

73

to very low acceptance ratios, which significantly reduce the efficiency of the method. The reason for the low acceptance ratio, i.e., the ratio between accepted MC moves and total MC trial moves, is the compact character of most biomolecules. This means that many move attempts end up ‘‘bumping’’ into other parts of the molecule and are rejected because of the clash. This is true in particular of moves defined in Cartesian coordinates. To partially overcome this problem it is recommended that torsion move-sets be used. Advanced applications of MC methods for conformation sampling often involve various techniques for enhanced sampling [7,8].

A special comment must be added regarding the random number generator. Because the MC process is driven by random numbers, it is sensitive to the quality of the random number generator that is being used. The random number generator is supposed to generate uniformly distributed random numbers in the range [0, 1]. But in fact the computer does not generate random numbers at all. It uses a deterministic algorithm to generate a pseudorandom series of numbers with a finite periodicity. A high quality algorithm will have a long enough periodicity that the observed distribution does indeed appear random. However, there are many so-called random number generators on the market that are anything but random. It is good practice to check the random number generator prior to using it in an actual MC simulation. A simple histogram of 10,000 numbers or more will easily show whether or not the resulting distribution is uniform. Good random number generators are given in Ref. 9.

C. Genetic Algorithms

A genetic algorithm (GA) evolves a population of possible solutions through genetic operations, such as mutations and crossovers, to a final population of low energy conformations that minimize the energy function (the fitness function) [10,11]. For the purpose of conformational sampling, the translational, rotational, and internal degrees of freedom are encoded into ‘‘genes,’’ which are represented by the real number values of those degrees of freedom [12]. Each individual conformation, named a ‘‘chromosome,’’ consists of a collection of genes and is represented by the appropriate string of real numbers. A fitness value (energy) is assigned to each chromosome.

The GA evolutionary process iterates through the following two steps: (1) Generation of a children population from a parent population by means of genetic operators, and (2) a generation update step. The process starts with a random population of initial chromosomes. A new population is generated from the old one by the use of genetic operators. The two most fundamental operators are schematically depicted in Figure 2. The mutation operator (Fig. 2a) changes the value of a randomly selected gene by a random value, depending on the type of gene. The crossover operator (Fig. 2b) exchanges a set of genes between two parent chromosomes, creating two children with genes from both parents. Additional genetic operators, such as the multiple simultaneous mutation operator or a migration operator, which moves individual chromosomes from one subpopulation to another, may also be used.

Parents are selected for breeding based on their relative fitness and an external evolutionary ‘‘pressure,’’ which directs the process to favor breeding by parents with higher fitness. Following each breeding cycle the chromosome population is updated. A common update scheme is to replace all but the most fit ‘‘elite’’ chromosomes of the original set. For example, a ‘‘generation update with an elitism of 2’’ indicates that all but the two most fit parent chromosomes are replaced [11]. The iteration through the two steps of the

74

Becker

Figure 2 Genetic operators used to create a population of children chromosomes from a population of parent chromosomes. (a) Single-point mutation. A gene to be mutated is selected at random, and its value is modified. (b) One-point crossover. The crossover point is selected randomly, and the genes are exchanged between the two parents. Two children are created, each having genes from both parents.

evolutionary process continues until convergence or until the maximum number of steps is reached.

Application of GA to conformational searching typically uses the potential energy as the fitness function. The degrees of freedom encoded in the ‘‘genes’’ are usually only a subset of coordinates that are most significant for the conformational search. These may include backbone dihedral angles, side chain dihedral angles, or any other coordinate that may be considered useful. Application of GA to docking problems requires that the three center-of-mass coordinates and the three rotational Euler angles also be encoded in genes.

An advantage of GA for conformational analysis, besides its elegance, is that it is very easy to code and run. In addition, GA usually requires fewer iterations than either MD or MC to generate a large population of low energy conformations. On the other hand, because in every iteration whole populations are propagated, each iteration takes significantly more CPU time than in either of the other two methods. Furthermore, in many complex cases GA is known for its slow convergence.

A successful application of GA to conformation sampling is, for example, as a part of flexible docking [12–14]. It should be noted, however, that none of the three sampling methods discussed above, MD, MC, and GA, was shown to outperform the other two in any general way. In fact, a comparison of the three methods in the context of flexible docking showed similar efficiency for all three [12], although specific advantages are likely to exist for particular applications.

D. Other Search Methods

Many additional search methods have been devised in addition to the basic three just discussed. A few of them are outlined below. Note that whereas some methods, such as parallel tempering and J-walking, are improved or specialized versions of the basic tech-

Conformational Analysis

75

niques, other methods, such as systematic enumeration and distance geometry, follow different paths altogether.

1. Systematic Enumeration

Systematic sampling is, in principle, the most thorough method for searching molecular conformational space. The energy is sampled over the entire range of each degree of freedom (typically torsion angles) at regularly spaced intervals. Thus, the sampled conformations lie on an N-dimensional lattice (N being the number of degrees of freedom). Several problems make the systematic sampling procedure irrelevant to most biomolecular systems. A major problem associated with systematic searching is that the number of conformations generated rapidly becomes extremely large even for small molecules. For example, systematic search with six rotatable bonds sampled at 30° increments results in almost 3 million conformations. The number of samples can be reduced by limiting the range of rotation for a symmetrical substituent (e.g., 0–180° for a phenyl group) and/or increasing the rotation step size (all staggered conformations of a saturated CEC bond can be sampled at 120° increments). Another problem arises from the fact that a systematic search is likely to generate a large number of unphysical conformations, in which one part of the molecule crosses over another part (e.g., a protein backbone that crosses over itself). As a result of the these limitations, systematic search is applied only to small molecules such as small peptides [15] or ligands in the context of flexible docking.

‘‘Pruning’’ is a sophisticated way to reduce the computational requirements associated with systematic search. Pruning takes advantage of the fact that all conformations along a given branch of the search tree that are downstream from a high energy conformation will also have high energies and need not be calculated. For example, if an irrevocable clash such as the backbone folding over itself occurs after the fifth angle in an eight-angle systematic search, there is no need to continue sampling the three remaining angles. Pruning can be performed using geometrical constraints that must be satisfied for all conformations. An energy cutoff is a simplistic filter that can be applied at the end of the search to reduce the number of conformations stored. With this filter, conformations with energies above the cutoff are discarded. This filter, of course, does not reduce the number of calculations but does help in managing and analyzing the data.

2. Distance Geometry

Distance geometry is a general method for building conformational models of complex molecular systems based on a set of distance constraints. It is a purely geometric technique that generates structures that satisfy the given set of constraints without requiring a starting conformation or an energy function. The distance constraints, which are the input to the method, can be qualitative or approximate. As such they are defined by upper and lower bounds and often into a matrix form. The distance geometry approach converts, or ‘‘embeds,’’ the uncertain distance constraints into 3D Cartesian coordinates.

In the basic metric matrix implementation of the distance constraint technique [16] one starts by generating a distance bounds matrix. This is an N N square matrix (N the number of atoms) in which the upper bounds occupy the upper diagonal and the lower bounds are placed in the lower diagonal. The matrix is filled by information based on the bond structure, experimental data, or a hypothesis. After smoothing the distance bounds matrix, a new distance matrix is generated by random selection of distances between the bounds. The distance matrix is converted back into a 3D conformation after the distance matrix has been converted into a metric matrix and diagonalized. A new distance matrix

76

Becker

randomly generated within the same bounds will result in another possible conformation that satisfies the model, and so forth. The tighter the bounds, the more restricted the search. Naturally, distance geometry is very useful in situations in which many distance constraints are known, in particular for suggesting models that agree with NMR data [17] (see Chapter 13). This method is also useful for pharmacophore modeling based on known bioactive molecules [18]. However, for general-purpose conformational searching this method is less appropriate unless the search space is limited by known constraints (such as active site data in a docking context). Computationally, distance geometry is limited to moderately sized systems (up to several thousand atoms) because it requires computationally expensive matrix manipulation. For a detailed review of distance geometry and its application to molecular problems, see Ref. 19.

3. Parallel Tempering and J-Walking

Energy barriers that confine the search to limited regions of conformational space restrict both molecular dynamics and Monte Carlo simulations, preventing them from reaching ergodicity. As discussed in Sections II.A and II.B, this limitation is often overcome by raising the simulation temperature. The higher temperature allows the simulation to overcome high energy barriers and extend the search space. However, it also causes the simulation to sample regions of conformational space that are irrelevant and inaccessible to room temperature molecules. The methods of parallel tempering and J-walking address the problem of how to raise the simulation temperature without wasting computational effort on inaccessible conformations.

The idea behind both parallel tempering and J-walking is to incorporate information obtained by ergodic high temperature simulations into low temperature simulations. By periodically passing information from high temperature simulations to low temperature simulations, these methods allow the low temperature system to overcome the barriers between separate regions. In practice, both methods, which can be applied in both MC and MD simulations, require propagation of at least two (and often more) parallel simula- tions—one at the desired low temperature and another at a high temperature. For simplicity we shall refer to the Monte Carlo implementation of these techniques; the molecular dynamics implementation is similar in nature.

In J-walking [20] the periodic MC trial probability for a simulation at temperature T is taken to be a Boltzmann distribution at a high temperature, TJ (βJ 1/kTJ), The jumping temperature, TJ, is sufficiently high that the Metropolois walk can be assumed to be ergodic. This results in the acceptance probability,

p min[1,e (β βJ)U]

(2)

where β 1/kT, βJ 1/kTJ, and U is the change in the potential energy. In practice, at temperature T the trial moves are taken from the Metropolis distribution of Eq. (1) about 90% of the time, with jumps using Eq. (2) attempted about 10% of the time. The jumping conformations are generated with a Metropolis walk at temperature TJ. Several methods have been devised to overcome correlations between the low and high temperature walks.

In parallel tempering [21], the high and low temperature walkers exchange configurations, unlike in J-walking, in which only the high temperature walker feeds conformations to the low temperature walker. By exchanging conformations, parallel tempering satisfies detailed balance, assuming the two walks are long enough. Parallel tempering simulations can be performed for more than two temperatures [22]. It should be noted