Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Conformational Analysis

77

that the gaps between adjacent temperatures must be chosen so that the exchanges are accepted with sufficient frequency. This means that the distribution characteristics of the two temperatures must overlap so that a conformation generated at one temperature will have a significant probability of being accepted at the other temperature. If the temperature gap is too large, the likelihood of any conformation being accepted by the other simulation is very small.

III. CONFORMATION OPTIMIZATION

The structures generated by the various sampling methods often correspond to transient conformations. It is a desirable practice to bring these conformations to nearby local minima, which represent locally stable conformations, before further analysis is performed. The most widely used methods for this purpose are the various minimization techniques. These vary in accuracy and computational efficiency and are useful in many applications as well as conformational analysis. Minimization methods are used as a tool (often alongside other computational tools) in model building, preparation of the initial structure for molecular dynamics, preparation of structures for normal mode analysis, and more. Because of their importance we go into some detail in discussing the different minimization approaches. Another conformation optimization approach is simulated annealing. In the context of conformational analysis, simulated annealing is often used to optimize conformations generated by high temperature MD or MC simulations. It is also often used as an independent method for minimization and identification of the global minimum in a complex energy landscape.

A. Minimization

A drop of water that is placed on a hillside will roll down the slope, following the surface curvature, until it ends up in the valley at the bottom of the hill. This is a natural minimization process by which the drop minimizes its potential energy until it reaches a local minimum. Minimization algorithms are the analogous computational procedures that find minima for a given function. Because these procedures are ‘‘downhill’’ methods that are unable to cross energy barriers, they end up in local minima close to the point from which the minimization process started (Fig. 3a). It is very rare that a direct minimization method

Figure 3 A schematic representation of two optimization schemes. (a) Minimization, which leads to the closest local minimum; (b) simulated annealing, which can overcome intermediate energy barriers along its path.

78

Becker

will find the global minimum of the function. This can be easily understood from our water drop analogy. The drop rolls down the slope until it reaches the smallest of hollows and stops there. It will not cross the nearest rise even if the valley on the other side of that rise is much deeper than the hollow it is in now.

In conformational analysis, minimization reduces the potential energy of a given conformation, typically by relieving local strains in the structure. However, the minimized structure will not stray far from the initial conformation. This is why the minimization process is considered an optimization technique and not a search method. Global optimiza- tion—namely, reaching the lowest point on the energy surface—cannot be addressed by straightforward minimization and is discussed in Chapter 17 (Protein Folding).

The goal of all minimization algorithms is to find a local minimum of a given function. They differ in how closely they try to mimic the way a drop of water or a small ball would roll down the slope, following the surface curvature, until it ends up at the ‘‘bottom.’’ Consider a Taylor expansion around a minimum point X0 of the general one-dimen- sional function F(X), which can be written as

F(X) F(X0) (X X0)F(X0)

1

 

(X X0)2 F(X0)

(3)

 

2

 

 

where Fand Fare the first and second derivatives of the function. The extension to a multidimensional function requires replacement of the variable X by the vector X and replacement of the derivatives by the appropriate gradients. Minimization algorithms are classified according to the amount of information about the function that is being used, represented by the highest derivative used by the algorithm. Algorithms that use only the value of the function and no derivative are called ‘‘order 0’’ algorithms. Algorithms that use the first derivative—the slope—of the function are denoted as ‘‘order 1’’ algorithms. Finally, ‘‘order 2’’ algorithms are algorithms that take advantage of the second deriva- tive—the curvature—of the function. As in most computational techniques, higher order methods that use more information about the function are generally more accurate at the price of being computationally more expensive, taking more time and more resources than the lower order methods. This section offers a brief discussion of the most common minimization algorithms. Additional information can be found in Ref. 23.

1. Order 0 Algorithms

Order 0 minimization methods do not take the slope or the curvature properties of the energy surface into account. As a result, such methods are crude and can be used only with very simple energy surfaces, i.e., surfaces with a small number of local minima and monotonic behavior away from the minima. These methods are rarely used for macromolecular systems.

Grid searching is an example of an order 0 minimization algorithm. In this method a regular grid (e.g., a cubic grid) is placed over the energy surface and the value of the function at each node is calculated. The grid point with lowest energy is taken to represent the real minimum. The quality of a grid search and the computational effort associated with it depend, of course, on the density of the grid mesh. In some applications the convergence of the method is improved by gradually increasing the density of the grid in the vicinity of the best point of an earlier, coarser grid. This method is inefficient for large molecules and can easily converge to a false minimum. Grid searching as a minimization technique is rarely used in biomolecular studies.

Conformational Analysis

79

Another order 0 method is the downhill simplex method [9]. A simplex is a geometrical element consisting, in N dimensions, of N 1 points (vertices) and all their interconnecting line segments, polygonal faces, etc. In two dimensions, a simplex is a triangle; in three dimensions it is a tetrahedron. The downhill simplex method starts with N 1 points, defining an initial simplex. The energy of each point is evaluated, and the method moves the point of the simplex from where the function is largest through the opposite face of the simplex to a lower point. Usually such ‘‘reflections’’ conserve the volume of the simplex, but when it can do so it will expand or contract the simplex to minimize the function. The minimization proceeds by taking the highest point in the new simplex and reflecting it through the opposite face and so forth. This process is schematically described in Figure 4. The simplex method is sometimes used for crude placement of starting conformations, for example in the context of docking. A detailed description of the simplex method can be found in Ref. 9.

2. Order 1 Algorithms

Order 1 algorithms, which represent a fair balance between accuracy and efficiency, are the most commonly used minimization methods in macromolecular simulations. As indicated by their name, these algorithms use the gradient of the function to direct the minimization process toward the nearest local minimum. They use information about the slope of the function but do not include information about its curvature. Thus, to compensate for the deficiency of curvature data, all order 1 minimization methods employ a stepwise iterative scheme. The iterations are used for recalculating the gradient and correcting the minimization approach pattern following changes in the direction of the slope.

In general, order 1 methods iterate over the following equation in order to perform the minimization until it converges or until it reaches a preset maximum number of steps:

 

 

(4)

rk rk 1

λk Sk

Figure 4 A representative step in the downhill simplex method. The original simplex, a tetrahedron in this case, is drawn with solid lines. The point with highest energy is reflected through the opposite triangular plane (shaded) to form a new simplex. The new vertex may represent symmetrical reflection, expansion, or contractions along the same direction.

80

Becker

where rk is the new position at step k, rk 1 is the position at the previous step k 1, λk is the size of the step to be taken at step k, and Sk is the direction of that step. Various methods differ in the way they choose the step size and the step direction, i.e., in the way they try to compensate for the lack of curvature information. For example, the steepest descent method uses a variable step size for this purpose, whereas the conjugated gradients method takes advantage of its memory of previous steps. These two popular algorithms are described next. Information about other minimization methods can be found in Ref. 9.

(a) Steepest Descent. Steepest descent (SD), which is the simplest minimization algorithm, follows the direction of the instantaneous gradients calculated at each iteration. This means that once the gradient of the energy function, gk, at the current position is calculated, the minimization step is taken in the direction opposite to it (i.e., in the direction of the force),

 

 

(5)

Sk gk U(r)

The step size, λk, is adjusted at each iteration to compensate for the lack of curvature information. If the energy of the new conformation is lower than that of the previous one, the algorithm assumes that the approach direction is correct and increases the step size λk by a small factor (often by the factor 1.2) to improve efficiency. However, if the energy of the new conformation turns out to higher, indicating that the real path has curved away from the current minimization direction, the step size λk is decreased (often by the factor 0.5) to allow the algorithm to correct the direction more efficiently. Because finite step sizes are used, the method does not flow smoothly down to the minimum but rather jitters around it. Furthermore, SD’s imprecise approach to the minimum usually means that the method does not converge to one point and gets stuck in a limit cycle around the minimum. This means that SD often gets close to the minimum but rarely reaches it. Despite the relatively poor convergence of SD it is an efficient minimization procedure that is very useful for crude local optimization, such as relieving bad contacts in an initial structure before a dynamics simulation or as the first phase of a more complex minimization scheme.

(b) Conjugated Gradients. To compensate for the missing curvature information, the conjugated gradients (CG) method makes use of its ‘‘memory’’ of gradients calculated in previous steps. The first step is taken in the direction of the force,

 

 

(6)

S1

g1

but for all subsequent iterations, k 1, the direction of the minimization step is determined as a weighted average of the current gradient and the direction taken in the previous iteration, i.e.,

 

 

 

(7)

Sk gk bk Sk 1

The weight factor bk is calculated as a ratio of the squares of the current and previous gradients.

bk |gk |2/|gk 1|2

(8)

For quadratic surfaces of dimension n, it can be shown that the method of conjugated gradients is very efficient, converging on the nth step. Nonetheless, even for nonquadratic surfaces such as molecular energy surfaces, the CG method converges much better than SD. A numerical disadvantage of the gradient memory employed in CG is that it accumu-

Conformational Analysis

81

lates numerical errors along the way. This problem is usually overcome by restarting the minimization process every so often; that is, at given intervals the gradient memory is wiped out by setting bk equal to zero.

3. Order 2 Algorithms

Order 2 minimization algorithms, which use the second derivative (curvature) as well as the first derivative (slope) of the potential function, exhibit in many cases improved rate of convergence. For a molecule of N atoms these methods require calculating the 3N 3N Hessian matrix of second derivatives (for the coordinate set at step k)

[Hk]ij

2

U(r)

(9)

ri rj

 

 

in addition to the 3N vector of first derivatives gk discussed above.

(a) Newton–Raphson. The Newton–Raphson minimization method is an order 2 method based on the assumption that, near the minimum, the energy can be approximated by a quadratic function. For a one-dimensional case, assuming that F(X) a bX cX2, the Newton–Raphson minimization can be formulated as

X* X F(X)/F(X)

(10)

where X* is the minimum, X is the current position, and Fand Fare the first and second derivatives at the current position. Namely, for a quadratic function, this algorithm finds the minimum in a single step. This conclusion can be generalized to the multidimensional case:

 

1

gk

(11)

Sk Hk

For nonquadratic but monotonic surfaces, the Newton–Raphson minimization method can be applied near a minimum in an iterative way [24].

There are several reasons that Newton–Raphson minimization is rarely used in macromolecular studies. First, the highly nonquadratic macromolecular energy surface, which is characterized by a multitude of local minima, is unsuitable for the Newton–Raphson method. In such cases it is inefficient, at times even pathological, in behavior. It is, however, sometimes used to complete the minimization of a structure that was already minimized by another method. In such cases it is assumed that the starting point is close enough to the real minimum to justify the quadratic approximation. Second, the need to recalculate the Hessian matrix at every iteration makes this algorithm computationally expensive. Third, it is necessary to invert the second derivative matrix at every step, a difficult task for large systems.

(b) Adopted Basis Newton–Raphson. A derivative method that is suited for large systems such as proteins is the adopted basis Newton–Raphson (ABNR) algorithm [25]. Instead of calculating the full multidimensional curvature (i.e., the full Hessian matrix) at each minimization step, the ABNR method limits its calculation to a small subspace of dimension s in which the system has made the most progress in past moves. The idea is to add curvature information only in those directions where it is likely to contribute most. This way the system moves in the best direction in the restricted subspace. Because the dimensionality s of the subspace is taken to be between 4 and 10, ABNR is an efficient minimization method comparable in CPU time to the order 1 methods such as the conjugated gradient. ABNR is also comparable to CG in terms of convergence. Since the method