Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

17

Protein Folding:

Computational Approaches

Oren M. Becker

Tel Aviv University, Tel Aviv, Israel

I.INTRODUCTION

‘‘Protein folding’’ is the term used to describe the complex process in which polypeptide chains adopt their three-dimensional ‘‘native’’ conformation. To carry out their functions, proteins must fold rapidly and reliably. They must satisfy a kinetic requirement that folding can be completed within a reasonable time and a thermodynamic requirement that the folded conformation be stable under physiological conditions. Although the folding process in a cell also involves catalytic and control mechanisms, for many, if not all, proteins, the information for folding is contained primarily in the amino acid sequence. Because there are so many possible conformations for any given polypeptide chain, these requirements mean that protein folding must be many orders of magnitude faster than a random search through conformation space [1]. An estimate of this speedup can be obtained by conservatively assuming that each amino acid residue has three possible conformations. If a protein is made up of 100 amino acids, there are about 1049 possible conformations for the entire polypeptide chain. Even if the time required to change from one conformation to another is as little as 10 ps (1 picosecond 10 12 s), a random search through all of conformation space would still require 1036 s, or about 1029 years. This estimate, often referred to as the ‘‘Levinthal paradox,’’ clearly indicates that protein folding is not a random search but rather follows a built-in bias toward the native state.

An important feature of protein folding is that the amino acid sequence of the protein uniquely determines its overall structure [2], which is a combination of secondary structure (the regions of α-helix and β-sheet) and tertiary structure (the overall folding pattern). Differences in sequences give rise to differences in secondary and tertiary structure. So far the three-dimensional structures of approximately 6000 proteins have been determined by X-ray crystallography and NMR spectroscopy. The domains in these proteins can be grouped into approximately 350 families of folds, which consist of sequences that have similar structures [3]. It has been estimated that the total number of different folds is only on the order of 1000 [3–5]. This number is much smaller than the total number of different sequences in the human genome, which is on the order of 100,000. Some of these folds are observed in a large number of sequences, whereas others have been found, so far, in

371

372

Becker

only a small number of instances. The frequency with which a fold occurs is probably related to the stability of the fold or to the speed of the folding process.

It should be noted that in almost all cases only one fold exists for any given sequence. The uniqueness of the native state arises from the fact that the interactions that stabilize the native structure significantly destabilize alternate folds of the same amino acid sequence. That is, evolution has selected sequences with a deep energy minimum for the native state, thus eliminating misfolded or partly unfolded structures at physiological temperatures.

The process of protein folding is one of the most fundamental biophysical processes. It is of interest also because of the important role it plays in the mechanisms and controls of a wide range of cellular processes. These include regulation of complex events during the cell cycle and translocation of proteins across membranes to their appropriate organelles [6]. Furthermore, it is known that the failure of proteins to fold correctly is associated with the malfunction of biological systems, leading to a broad range of diseases. Some of these diseases, such as Alzheimer’s and Creutzfeldt-Jakob diseases, are associated with the conversion of normal soluble proteins into insoluble aggregated amyloid plaques and fibrils [7]. Others, for example cystic fibrosis, result from mutations that hinder the normal folding and secretion of specific proteins [8].

As with any other chemical reaction, understanding protein folding requires knowledge of the interactions that dominate it as well as an insight into the kinetics and dynamics of the process. Substantial progress has been made in recent decades toward achieving such an understanding for simple chemical reactions [9]; however, our current knowledge is less advanced with regard to the more complex protein folding reaction. Nonetheless, in the last decade or so substantial strides toward a comprehensive understanding of the folding process were achieved through a combination of theoretical and experimental studies. It is important to note that protein folding reactions have very different characteristics from reactions of small molecules. For example, experiments have shown that although the Arrhenius equation can often be applied to protein folding, the preexponential factor has a strong temperature dependence [10]. Furthermore, under physiological conditions the free energy of the native state of a protein is only slightly lower than that of the unfolded state. This is due to a near cancelation of large energetic and entropic contributions. The energetics of protein folding are dominated by the nonbonded van der Waals and electrostatic terms in the potential energy function (see Chapter 2), including both intramolecular interactions between the atoms of the protein and intermolecular interactions between the protein atoms and the solvent [11]. In particular, it was found that the nonpolar (hydrophobic) groups strongly favor the folded state due to the attractive van der Waals interactions in the native structure and to the hydrophobic effect, which favors the burial of nonpolar groups. By contrast, polar groups (the peptide groups and the polar and charged side chains) contribute much less to the stability of the native state due to a balance between the interactions in the interior of the protein and those with the solvent. For example, in lysozyme [12], calculations show that at 25°C the nonpolar groups contribute 450 kcal/mol whereas the polar groups contribute only 87 kcal/mol to the free energy of denaturation. The overall stabilization of the native state due to these energy interactions

(about 537 kcal/mol) is counterbalanced by a configuration entropy contribution of about 523 kcal/mol at 25°C. This yields a net free energy of unfolding of only 14 kcal/mol (on

the order of 0.1 kcal/mol per residue), which is a typical value for globular proteins. In contrast, the energy or enthalpy difference between the native and unfolded states can be significantly larger; for lysozyme at 25°C, the unfolding enthalpy is 58 kcal/mol [12].

Protein Folding: Computational Approaches

373

Chemical reactions, including protein folding, are best understood from the vantage point of their underlying ‘‘energy landscapes,’’ which are theoretical manifestations of the interactions that contribute to the chemical processes. An energy landscape is a surface defined over conformation space indicating the potential energy of each and every possible conformation of the molecule. Similar to regular topographic landscapes, valleys in an energy landscape indicate stable low energy conformations and mountains indicate unstable high energy conformations. However, although reactions of small molecules can be characterized directly by the potential energy landscape, the high dimensionality of protein conformation spaces often makes a temperature-dependent effective energy landscape (or free energy landscape) the theoretical framework of choice. Such a surface corresponds to a Boltzmann weighted average of the accessible energies along an appropriately chosen reaction coordinate (or progress variable). The latter, which describes the approach to the native state, is obtained by averaging over many nonessential degrees of freedom. Such a reaction coordinate describes the progress of the reaction from the initial to the final state but includes the possibility of many different paths on the original high dimensional energy landscape.

Because protein folding is determined primarily by the amino acid sequence, the difference between foldable sequences and unfoldable sequences should be manifested in their underlying energy landscapes. A folding sequence is expected to have the energy of its conformations proportional to a reaction coordinate Q, with some roughness that is introduced by non-native contacts. This correlation of energy and structure introduces a bias in favor of the native conformation as well as a bias against conformations that are significantly different from the native structure. Such a correlation is responsible for the funnel shape of the landscape (Fig. 1b). A random sequence will not exhibit such a correlation between energy and conformation, and the corresponding energy landscape is expected to be rough (Fig. 1a). Because proteins are finite systems, if they have a single native state there is always a temperature below which the native state is stable. This temperature is called the folding temperature, Tf. On the other hand, due to the roughness of the landscape there is also a temperature below which the kinetics are controlled by nonnative traps and not by the bias toward the native state. This temperature is denoted as Tg, in reference to a similar transition temperature in glasses. For a sequence to fold it is necessary that the folding temperature be higher than the glass temperature, Tf Tg. That is, the competition between the energetic bias toward the native state and the landscape’s roughness plays a central role in the folding process, leading to a diversity of folding scenarios [13–15].

Thus, energy landscape theory offers a solution to many of the kinetic and thermodynamic perplexities of protein folding. The kinetic bias toward the native state is explained as an overall bias in the energy landscape itself, where a large depression or ‘‘funnel’’ around the native state biases the folding process toward this structure. An interplay between the bias toward the native conformation, the relative stability of that structure, and the roughness of the landscape gives rise to the non-Arrhenius temperature dependence of the folding process, highlighting the interplay of energy and entropy. Furthermore, the unique topography and the multidimensionality of the landscape allow for multiple folding pathways that pass a multidimensional folding ‘‘seam’’ (rather than a single one-dimen- sional barrier) that can still be described by an average reaction coordinate.

Naturally, the pivotal role of protein folding in biophysics and biochemistry has yielded a very large body of research. In this chapter we focus primarily on the different theoretical and computational approaches that have contributed to the current understand-

374

Becker

Figure 1 (a) Schematic energy landscape for a random unfoldable heteropolymer. The roughness is on the order of the energy bias, and the sequence is likely to be trapped in low energy states far from its native state. (b) A schematic energy landscape for a foldable proteinlike heteropolymer. The funnel-like topography is characterized by an energy bias toward the native state that is much larger than the roughness of the landscape.

ing of the folding process. Discussion of other aspects of protein folding can be found in many excellent reviews, which address this topic from different points of view. Some of these reviews can be found in Refs. 11, 13, 14, and 16–21.

II. SIMPLE MODELS

Significant theoretical progress in understanding protein folding has been achieved by examining the properties of simple models of energy landscapes. Such models often look at proteins as a special class of heteropolymers. Whereas proteinlike heteropolymers have a well-defined three-dimensional conformation, random heteropolymers with a tendency to collapse do not have such a conformation but rather a collection of different low energy structures. The ‘‘minimally frustrated random energy model’’ introduced by Bryngelson and Wolynes [22] is one of the more successful models for protein folding. The model is based on two assumptions: (1) The energies of non-native contacts may be taken as random variables, and (2) on average, the overall energy of the protein decreases as the protein comes to look more like the native state, regardless of the measure used to gauge its similarity. This second assumption implies that there is an overall energy bias toward the native state.

Representing a heteropolymer, the model tries to capture three contributions to the overall energy E of each conformation: the self-energy ε of each amino acid, a bond energy term Ji , i 1 between two neighboring residues, and a nonbonded Ki , j interaction

Protein Folding: Computational Approaches

375

term, primarily for hydrophobic interactions, that draws the amino acids close to each other,

E εi Ji , i 1

Ki , j

(1)

i i

i , j

 

Each of these terms depends on the specific state αi of each of the N residues and on its position ri. That is, the energy E is a complex function of {αi} and {ri}. Since this problem is too complicated to be solved by standard ensemble statistical mechanics, the researchers replace the above complex Hamiltonian with a stochastic one that shares the same statistical properties. The energy of the protein is thus taken to be a random variable assigned from a distribution that has the same characteristics as the full Hamiltonian, following a technique developed in the study of spin glasses. This generates the so-called random energy model. The bias toward the native state is introduced via the nonbonded interaction term K. In the random energy model, non-native interactions are randomly selected from a distribution of energies with a mean value of K and standard deviation K. Only native nonbonded interactions are consistently assigned the value K, where K K. This proteinlike model is then subjected to an in-depth analysis of its thermodynamics and kinetics, using a single order parameter to describe the distance from the native state. The kinetics of this model were studied for two variants that differ in the kinetic connectivity between different states [23]. In one variant the landscape was ‘‘locally connected,’’ meaning that only states with a similar value of the order parameter, which describes similarity to the native state, were kinetically accessible to one another. In the other variant, ‘‘global connectivity’’ between the energy states was assumed.

Kinetic studies such as these use the ‘‘master equation’’ to follow the flow of probability between the states of the model. This equation is a basic loss–gain equation that describes the time evolution of the probability pi(t) for finding the system in state i [24]. The basic form of this equation is

dpi(t)

[Wij pj(t) Wji pi(t)]

(2)

dt

 

 

 

j

 

where Wij is the transition probability from state j to state i. Equation (2) can be rewritten in matrix form by defining the transition matrix elements as

 

 

k

 

 

Wij Wij δij

 

Wki

 

(3)

The matrix W has the properties that Wij 0 for i j and that the sum over each column is zero; i.e., iWij 0 for all j. This last property is required for a closed system so that the flux out of any given state remains within the system (i.e., goes into the other states of the system). In matrix form Eq. (3) becomes

(t) Wp(t)

(4)

which has the formal solution p(t) etW p(0), where p(t) is the probability vector at time t.

Solving the master equation for the ‘‘minimally frustrated random energy model’’ showed that the kinetics depend on the connectivity [23]. For the ‘‘globally connected’’ model it was found that the resulting kinetics vary as a function of the energy gap between the folded and unfolded states and the roughness of the energy landscape. The model