Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

8

Normal Mode Analysis of

Biological Molecules

Steven Hayward

University of East Anglia, Norwich, England

I.INTRODUCTION

Normal mode analysis exists as one of the two main simulation techniques used to probe the large-scale internal dynamics of biological molecules. It has a direct connection to the experimental techniques of infrared and Raman spectroscopy, and the process of comparing these experimental results with the results of normal mode analysis continues. However, these experimental techniques are not yet able to access directly the lowest frequency modes of motion that are thought to relate to the functional motions in proteins or other large biological molecules. It is these modes, with frequencies of the order of 1 cm 1, that mainly concern this chapter.

Normal mode analysis was first applied to proteins in the early 1980s [1–3]. Much of the literature on normal mode analysis of biological molecules concerns the prediction of functionally relevant motions. In these studies it is always assumed that the soft normal modes, i.e., those with the lowest frequencies and largest fluctuations, are the ones that are functionally relevant. The ultimate justification for this assumption must come from comparisons to experimental data. Several studies have been made in which the predictions of a normal mode analysis have been compared to functional transitions derived from two X-ray conformers [4–7]. These studies do indeed suggest that the low frequency normal modes are functionally relevant, but in no case has it been found that the lowest frequency normal mode corresponds exactly to a functional mode. Indeed, one would not expect this to be the case.

Normal mode analysis is a harmonic analysis that assumes that, over the range of thermal fluctuations, the conformational energy surface can be characterized by the parabolic approximation to a single energy minimum. However, there exists abundant evidence, both experimental [8] and computational [9], that the harmonic approximation breaks down spectacularly for proteins at physiological temperatures, where, far from performing harmonic motion in a single energy minimum, the state point visits multiple minima, crossing energy barriers of various heights. Even if the motion within a single energy minimum is representative of the motion within all energy minima, as appears to

153

154

Hayward

be the case [10,11], barrier-crossing events would be expected to have an even greater influence on the overall motion of the molecule, with no obvious relation to the motion within individual minima. Given the level of the approximation, then, the relative success of the normal mode analysis is surprising and intriguing.

In the following, the method itself is introduced, as are the various techniques used to perform normal mode analysis on large molecules. The method of normal mode refinement is described, as is the place of normal mode analysis in efforts to characterize the nature of a protein’s conformational energy surface.

II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE

This section describes the basic methodology of normal mode analysis. Owing to its long history it has been described in detail in the context of many different fields. However, to aid in understanding subsequent sections of this chapter, it is described here in some detail.

To do a normal mode analysis one needs a set of coordinates, a force field describing the interactions between constituent atoms, and software to perform the required calculations. A normal mode analysis requires three main calculations. The first is the minimization of the conformational potential energy as a function of the atomic Cartesian coordinates. The various energy minimization techniques have been described previously in this volume. To find a true minimum, line search algorithms are often used in the later stages of minimization. At a minimum, the potential energy can be expanded in a Taylor series

in terms of mass-weighted coordinates, qi mj xj , qi 1 mj yj , qi 2 mj zj , where j labels the N atoms, and i the 3N Cartesian coordinates, to give

 

1

3N

2

0

 

 

 

 

 

 

V

 

V

qi qj

(1)

2

qi qj

 

 

i, j 1

 

 

 

 

The first term in the expansion, the value of the energy at the minimum, has been set to zero. The linear terms are also zero, as the first derivatives of the energy (the force) at a minimum are also zero. In normal mode analysis the higher order terms are neglected, and the second derivatives calculated at the minimum are assumed to characterize the energy surface over fluctuations that are far from the minimum. These second derivatives are the elements of the symmetric matrix, F, that is often called the Hessian. The calculation of the Hessian is the second major calculation in a normal mode analysis. The Langrangian, the kinetic energy minus the potential energy, can be written as

 

1

3N

 

1

3N

2

0

 

 

L

i2

 

 

V

qi qj

(2a)

2

2

qi qj

 

 

i 1

 

 

i, j 1

 

 

 

 

or in vector-matrix form as

L

1

t

1

qt Fq

(2b)

 

 

2

2

 

where the superscript t denotes the transpose. This determines the dynamics of the system. As F is a symmetric matrix, there exists an orthogonal transformation that diago-

nalizes F:

Normal Mode Analysis of Biological Molecules

155

Wt FW

(3)

where is the diagonal matrix, W is a matrix of order 3N, and Wt W I or WWt I, where I is the identity matrix. Simple rearrangement of Eq. (3) shows that it can also be regarded as an eigenvalue equation, where each element of is an eigenvalue associated with a specific column of W, the eigenvector. Using W, a new coordinate set can be defined:

Q Wt q

(4)

or

 

3N

 

Qi Wki qk

(5)

k 1

where i 1, . . . , 3N. Each Qi is a collective variable termed a normal mode coordinate. Substitution of Eq. (4) into Eq. (2b) leads to 3N independent equations of motion for each normal mode coordinate, each with a solution Qi Ai cos (ωi t εi), showing that each normal mode coordinate oscillates sinusoidally with an angular frequency ωi. Each normal mode coordinate specifies a set of atomic displacements through Eq. (4):

qk Wki Ai cos (ωi t εi)

(6)

The pattern of motion, or normal mode, is fully specified by the Wki’s, i.e., the eigenvector associated with ωi. Therefore, normal modes and their frequencies of oscillation are determined by the eigenvectors and eigenvalues of F. The diagonalization of F, then, is the final major computational challenge in performing a normal mode analysis. Six normal modes describing the rigid-body motion of the whole molecule have frequencies of zero and are usually eliminated from any further analysis. The general solution to the equation of motion involves a sum of the terms of Eq. (6) (over the normal mode index i), each with a different amplitude and phase. The precise amplitude and phase of each normal mode is determined by the initial conditions.

Application of the equipartition law shows that for a molecule in thermal equilib-

rium,

Qi2

kB T

 

 

 

 

 

(7a)

ωi2

 

 

 

 

 

 

 

 

 

 

or for the mass-weighted atomic coordinates,

 

 

 

3N 6

 

 

 

2

 

qi2 kB T

 

Wik

 

 

(7b)

ωk

 

 

 

 

k 1

 

 

where kB is Boltzmann’s constant and T the absolute temperature. Equation (7b) is often used to calculate the mean-square fluctuations (MSFs) of atoms for comparison to those derived from the atomic B-factors in X-ray crystallography. From Eq. (7b) it is easy to show that

3N 6

3N 6

3N 6

 

 

 

1

 

qi2 qi2 kB T

 

(8)

ωk2

i 1

i 1

k 1

 

156

Hayward

Equation (8) shows that it is the fluctuations of the lowest frequency modes that contribute most to the overall fluctuation of the molecule. For example, in the case of lysozyme, the lowest frequency normal mode (out of a total of 6057) accounts for 13% of the total massweighted MSF. It is for this reason that it is common to analyze just the lowest frequency modes for the large-scale functional motions.

Covariances between the Cartesian coordinates can also be calculated using

3N 6

1

 

 

qi qj kB T Wik Wjk

 

(9a)

ωk2

 

k 1

 

 

 

or, in matrix form,

 

 

 

U qqt kB TW1 Wt

(9b)

where the zero frequency eigenvalues and eigenvectors are eliminated. Equation (9b) shows how one can perform a quasi-harmonic analysis by diagonalizing the variance– covariance matrix of atomic fluctuations determined from a molecular dynamics simulation to attain a set of quasi-harmonic modes and effective frequencies. Note that a quasiharmonic analysis necessarily includes any anharmonic effects in the molecular dynamics simulation.

Many thermodynamic quantities can be calculated from the set of normal mode frequencies. In calculating these quantities, one must always be aware that the harmonic approximation may not provide an adequate physical model of a biological molecule under physiological conditions.

III.NORMAL MODE ANALYSIS OF LARGE BIOLOGICAL MOLECULES

On the face of it, a normal mode analysis of many biological molecules is a daunting task. A normal mode analysis of the protein citrate synthase, a homodimer of 874 residues, would involve the diagonalization of a matrix whose order is in the tens of thousands. Out of the proteins whose structures have been solved, this is by no means outstandingly large, but a full-scale diagonalization of the Hessian of citrate synthase is still not feasible. However, methods exist that can reduce the size of the calculation considerably without sacrificing accuracy. Indeed, the diagonalization of a large symmetric matrix is a ubiquitous task for which many different numerical techniques have been developed [12]. As an alternative, reduced basis sets can be used [13] that, if chosen carefully, can considerably reduce the size of the Hessian without compromising accuracy. By far the most popular among these is the dihedral angle space normal mode analysis [1].

Further reductions can be achieved by taking symmetry into account, an approach that holds promise for the analysis of large oligomeric proteins such as virus capsids [14].

Combining all these techniques suggests that, provided minimization has been achieved, a number of the lowest frequency normal modes of any protein can be accurately determined.

A.Determination of the Eigenvalues and Eigenvectors of a Large Hessian

The determination of some of the eigenvalues and eigenvectors of a large real symmetric matrix has a long history in numerical science. Of particular interest in the normal mode

Normal Mode Analysis of Biological Molecules

157

analysis of large biomolecules, where only some of the lowest frequency normal modes are required, are iterative techniques such as the Lanczos algorithm [12], which is an efficient algorithm suited to the computation of a few outer eigenvalues and eigenvectors. In its simplest form the Lanczos algorithm starts with an initial vector, which could be a guess of the lowest frequency normal mode, and iteratively increases the dimension of the space one dimension at a time by application of an operator that could be the Hessian itself. It can be shown that the subsequent projection of the Hessian into this space and diagonalization of the resulting reduced matrix will produce better and better approximations to the outer eigenvalues and eigenvectors as the dimension of the space increases. Sophisticated techniques are available that make the algorithm very efficient. This approach has been used with an operator that should lead to faster convergence and is particularly suited to a diagonally dominant matrix [13,15]. Although the Hessian is not a naturally sparse matrix, due primarily to the long range Coulomb interaction, cutoff methods have been used to create a sparse matrix that can then be transformed to a diagonally dominant form.

A block Lanczos algorithm (where one starts with more than one vector) has been used to calculate the first 120 normal modes of citrate synthase [4]. In this calculation no apparent use was made of symmetry, but it appears that to save memory a short cutoff

˚

of 7.5 A was used to create a sparse matrix. The results suggested some overlap between the low frequency normal modes and functional modes determined from the two X-ray conformers.

Although the Lanczos is a fast efficient algorithm, it does not necessarily give savings in memory. To save memory a number of techniques divide the molecule into smaller parts that correspond to subspaces within which the Hessian can be expressed as a matrix of much lower order. These smaller matrices are then diagonalized. The methods described below show how one then proceeds to achieve good approximations to the true low frequency modes by combining results from subspaces of lower dimension.

Mouawad and Perahia [16] developed a technique, which they call ‘‘diagonalization in a mixed basis,’’ whereby the Hessian is projected into a subspace spanned by a union of the space defined by combining low frequency eigenvectors of the parts, with a space defined by a selected set of Cartesian coordinate vectors. An iterative process has been devised whereby diagonalization in this subspace and other subspaces, created by selecting new sets of Cartesian coordinate vectors, produces new low frequency eigenvectors with which one can repeat the whole process. It converges to yield good approximations to the true low frequency modes. This technique has been applied to some very large proteins such as hemoglobin [7] (600 residues), for which the first 203 lowest frequency modes were estimated, and aspartate transcarbamylase [6], a dodecamer of some 2760 residues for which the first 53 modes were calculated.

A related method is the component synthesis method [17], which uses a so-called static condition to model the interactions between parts of a molecule whose corresponding diagonal blocks in the Hessian are first diagonalized. It has been combined with a residue clustering algorithm that provides a hierarchy of parts, which at the lowest level provides small enough matrices for efficient diagonalization [18]. It has been applied to doublehelical DNA [17] and the protein crambin [18].

In another promising method, based on the effective Hamiltonian theory used in quantum chemistry [19], the protein is divided into ‘‘blocks’’ that comprise one or more residues. The Hessian is then projected into the subspace defined by the rigid-body motions of these blocks. The resulting low frequency modes are then perturbed by the higher