Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Protein Folding: Computational Approaches

383

Of specific interest are the unfolding simulation studies that highlight the role of the solvent in the folding and unfolding process, an insight that is very difficult to obtain experimentally. For example, simulations of the early stages of barnase unfolding at high temperature [47] showed that solvent plays a key role in the denaturation process. It was found that an important element of the helix-unfolding transition is the replacement of an α-helical hydrogen bond ( i to i 4, where i is an amino acid residue) by water hydrogen bonds through an intermediate involving a 310 (i to i 3), or reverse turn, hydrogen bond. Denaturation of a β-sheet was also observed to start by the distortion of the β-sheet hydrogen bonds, followed by the insertion of hydrogen-bonding water molecules between the strands. Finally, significant solvent participation was found even in the denaturation of the central stabilizing element of globular proteins—the hydrophobic core. This happens as some water molecules form ‘‘cage structures’’ around hydrophobic groups, often involving hydrogen bonds to water molecules outside the core. There are, however, concerns as to whether the observed water behavior corresponds to the actual denaturation process. The reason is that high temperature unfolding simulations are done either with a room temperature water density [47] or with low water density followed by rapid water penetration when the temperature is set equal to room temperature [48,49]. These procedures create an artificially high pressure, which may force water into protein cavities. Nonetheless, comparisons of unfolding simulations results at different temperatures seem to indicate that this effect is not very great [17].

B. Mapping Atomistic Energy Landscapes

An alternative approach to the study of protein folding on an atomic level is to base the study on conformation sampling rather than on direct simulation of the folding process. Sampling of folded and unfolded conformations allows for reconstructing the underlying energy landscape and for deducing the folding pathway (or pathways) from it.

In principle, energy landscapes are characterized by their local minima, which correspond to locally stable conformations, and by the transition regions (barriers) that connect the minima. In small systems, which have only a few minima, it is possible to use a direct approach to identify all the local minima and thus to describe the entire potential energy surface. Such is the case for small reactive systems [9] and for the alanine dipeptide, which has only two significant degrees of freedom [50,51]. The direct approach becomes impractical, however, for larger systems with many degrees of freedom that are characterized by a multitude of local minima.

A useful procedure for characterizing the multiminimum energy landscape of large systems was introduced by Stillinger and Weber [52]. These researchers investigated the energy landscape of water by quenching (i.e., minimizing) configurations from a molecular dynamics trajectory down to their nearest local minima. Using this procedure a sample of the local minima accessible at a given temperature was obtained, providing a ‘‘map’’ of the underlying landscape. Following the original work this procedure was applied to a variety of systems, including water [52], rare gas clusters [53], and proteins such as myoglobin [54] and bovine pancreatic tripsin inhibitor (BPTI) [55]. The protein studies showed that there are a very large number of local minima in the vicinity of the native state of the protein. Furthermore, the local minima are kinetically clustered into subsets, within which they tend to be connected by low barriers.

Atomic level studies of complex peptide and protein energy landscapes have become more detailed as computers have become faster, allowing for longer sampling simulations

384

Becker

and more complicated analysis. A problem that is faced by protein energy landscape cartographers is that of how to represent the resulting conformation sample in a meaningful way that will allow visualization and analysis of the underlying landscape. As far as the folding process is concerned, good results have been obtained by using one (or a few) effective reaction coordinates such as similarity to the native state (Q) or radius of gyration (Rg) [70,71]. These, however, are not very useful in exploring the energy landscape near the native state of a large protein or of peptides. Instead, to reduce the dimensionality of the data and to allow easier analysis of the landscape, it is becoming increasingly popular to use principal component analysis (PCA) (see Chapter 4) for this purpose. PCA is used to project the high-dimensional conformation sample onto a low-dimensional subspace that best represents it. The combination of PCA with long-time molecular dynamics has led to detailed studies of the energy landscape of proteins such as lysozyme [56], CRP:(cAMP)2 [57], cytochrome c [58], and crambin [59]. In all, these systems exhibit complex landscapes with multiple basins. The observed dynamics on these landscapes typically involve long periods of motion within a basin followed by fast transition from one basin to another. These observations led Go and collaborators to suggest a ‘‘jumping among minima’’ (JAM) model to help analyze the simulation results [60].

Combining the PCA projection with an energy scale allows for 3D visualization of the underlying landscape. It should be noted, however, that without specific information on the barriers such PCA representations of the landscapes will at most reflect their overall shape, limited by the quality of the projection, and not necessarily their details. Nonetheless, the lack of information on the barriers is somewhat compensated for by the presence of ‘‘empty spaces,’’ which correspond to poorly sampled regions associated with high energy [59,61]. A problem associated with generating three-dimensional PCA views of protein energy landscapes is that the other principal coordinates, which are not included in this view, will manifest themselves as ‘‘noise’’ or ‘‘roughness’’ in the low-dimensional representation. This is because each point in the plain defined by the two main principal coordinates {Q1, Q2} is associated with many conformations of different energies, separated from each other in the other principal coordinates {Q3, Q4, . . .}. When the number of sampling points is small this problem can be overcome by a simple smoothing procedure, such as that used in mapping the energy landscape of alanine tetrapeptide [62]. However, when many conformations are included in the conformation sample, the ‘‘minimum energy envelope’’ procedure can be used to reduce the roughness [61]. For each value (on a grid) of the two main principal coordinates {Q1, Q2} this procedure chooses the lowest conformation energy among all conformations that project onto this 2D grid point. The resulting smooth landscape is equivalent to an adiabatic surface, a surface that has been minimized in all coordinates other than Q1 and Q2. The resulting 3D view offers a direct visualization of the main basins on the energy landscape. Figure 4 shows the energy landscape of the prion protein (PrP) (residues 124–226) in vacuum [63]. Two large basins are clearly seen. The first is a deep but narrow basin associated with the native PrPc conformation [7]. The second basin, which is shallower but wider, is associated with a second group of conformations of a partially unfolded protein. These offer a framework for studying the kinetics of protein folding.

Clearly, mapping energy landscapes based only on local minima gives only a partial description of the energy landscape, because the maps do not contain information about the energy barriers that govern the system’s kinetics. It is the knowledge of the transition states that allows a detailed exploration of kinetics through the use of the master equation approach [Eqs. (2)–(4)]. One of the first detailed studies of this sort was performed by

Protein Folding: Computational Approaches

385

Figure 4 The energy landscape of the prion protein (PrP) (residues 124–226) in vacuum, obtained by principal coordinate analysis followed by the minimal energy envelope procedure. Two large basins are seen. One basin is associated with the native PrPc conformation; the other is associated with partially unfolded conformations.

Czerminski and Elber [64], who generated an almost complete map of the minima and barriers of an alanine tetrapeptide in vacuum. Using the master equation approach they were able to study aspects of this system’s kinetics, which involve the crossing of barriers of different heights.

Obtaining information regarding barriers, which accounts for state-to-state transition states, is a complicated computational task (see Chapter 10). However, even if such data are obtained, their complexity renders it difficult to introduce barrier information into the description of the atomistic energy landscape. In particular, one would like to extract from the raw data information regarding the overall connectivity of the landscape as well as information regarding the global basin-to-basin kinetic transitions. It is the transition from the ensemble of unfolded conformations (‘‘unfolded basin’’) to the ensemble of folded conformations (‘‘folded basin’’) that is of interest, rather than individual transitions between specific conformations. This type of ‘‘global’’ kinetics is in line with the type of observations available experimentally. To address this problem the method of ‘‘topological mapping’’ was introduced by Becker and Karplus [65]. Based on barrier information this method partitions conformation space into its component energy basins, thus highlighting the overall basin-to-basin connectivity of the landscape. At any energy level E the molecular conformation space can be partitioned into disconnected regions consisting of local minima that are connected by barriers lower than E. The method of topological mapping follows the way these disconnected regions, or ‘‘basins,’’ connect and disconnect as a function of increasing and decreasing energy E. An elementary basin R(α) is defined

386

Becker

as a connected set of molecular conformations that, when minimized, map to the same single local minimum. Topological mapping groups these elementary basins according to the barriers between them. At any energy level E (or temperature level T ) the multidimensional landscape is thus partitioned into ‘‘superbasins,’’ R E (α′), defined as the union of elementary basins R(α) connected by barriers lower than energy E (or T ).

R E (α′) R(α)

(8)

Each such ‘‘superbasin’’ is then mapped to its lowest minimum α′ in a way that is analogous to simulated annealing (Fig. 5a). As a result, minima connected by barriers lower than E are grouped together and separated from other minima to which they are connected by higher barriers. A topological ‘‘disconnectivity’’ graph is obtained by following the way these superbasins break up as the system’s energy E decreases. Each node on this graph (Fig. 5b) reflects a conformational superbasin on the landscape, and the connecting edges reflect the basin connectivity. The node at the top of the tree-graph corresponds to the ergodic limit, in which all states are connected. As the energy is decreased the graph splits to indicate basins that are becoming disconnected at that energy level. The topological mapping method resembles the Lid method independently developed by Sibani et al. [66] to study the energy landscape of crystals and glasses.

An advantage of topological mapping is that the resulting disconnectivity tree graph reflects, in a straightforward way, the overall topography of the energy landscape. For example, a tree graph reflecting ‘‘funnel’’ topography would be characterized by a single main branch with many small side branches that do not undergo additional splitting. On the other hand, a tree graph that corresponds to a landscape characterized by several large competing basins will exhibit several large branches, each displaying a complex branching pattern of its own. In the case of a completely rough landscape, no dominant branch can be detected in the disconnectivity graph. Application of this analysis method to the energy

Figure 5 A schematic representation of a ‘‘topological mapping’’ of an energy landscape. (a) The energy landscape is studied at different energies E. Each region of connected conformations, denoted as a ‘‘superbasin’’ R E(α′), is mapped to its lowest minima α′. (b) The corresponding topological ‘‘disconnectivity’’ tree graph reflects the way superbasins become disconnected as the energy is decreased.

Protein Folding: Computational Approaches

387

landscape of alanine tetrapeptide, based on the data of Czerminski and Elber [64], showed that this all-atom energy landscape is dominated by a ‘‘funnel’’ topography although the presence of a large kinetic trap could also be detected [65]. The insight into the connectivity of this landscape was used to study the overall basin-to-basin kinetics of this tetrapeptide, employing the master equation approach [65]. A very clear funnel topography is also seen in the disconnectivity graph of linear alanine hexapeptide (Ala)6 shown in Figure 6 [67]. The method of topological mapping was successfully employed to characterize the energy landscape of different types of atomic and molecular clusters [68].

A different approach for handling barrier information was suggested by Kunz and Berry [69]. In this method conformations are sampled along high temperature dynamical trajectories, with the connectivity, including saddle points, determined for successive coordinate sets along a given trajectory. The minima–barrier–minima triplets are then put together in a way that follows the descent from high energy conformations to low energy structures. This results in linear cross sections through the high-dimensional energy landscape. Applying this method to different types of clusters led to the distinction between ‘‘structure-seeking’’clusters, such as the (KCl)32 cluster, that exhibit a steep staircase-like

Figure 6 The topological disconnectivity graph of alanine hexapeptide. (Adapted from Ref. 67.)