- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
Protein Folding: Computational Approaches |
383 |
Of specific interest are the unfolding simulation studies that highlight the role of the solvent in the folding and unfolding process, an insight that is very difficult to obtain experimentally. For example, simulations of the early stages of barnase unfolding at high temperature [47] showed that solvent plays a key role in the denaturation process. It was found that an important element of the helix-unfolding transition is the replacement of an α-helical hydrogen bond ( i to i 4, where i is an amino acid residue) by water hydrogen bonds through an intermediate involving a 310 (i to i 3), or reverse turn, hydrogen bond. Denaturation of a β-sheet was also observed to start by the distortion of the β-sheet hydrogen bonds, followed by the insertion of hydrogen-bonding water molecules between the strands. Finally, significant solvent participation was found even in the denaturation of the central stabilizing element of globular proteins—the hydrophobic core. This happens as some water molecules form ‘‘cage structures’’ around hydrophobic groups, often involving hydrogen bonds to water molecules outside the core. There are, however, concerns as to whether the observed water behavior corresponds to the actual denaturation process. The reason is that high temperature unfolding simulations are done either with a room temperature water density [47] or with low water density followed by rapid water penetration when the temperature is set equal to room temperature [48,49]. These procedures create an artificially high pressure, which may force water into protein cavities. Nonetheless, comparisons of unfolding simulations results at different temperatures seem to indicate that this effect is not very great [17].
B. Mapping Atomistic Energy Landscapes
An alternative approach to the study of protein folding on an atomic level is to base the study on conformation sampling rather than on direct simulation of the folding process. Sampling of folded and unfolded conformations allows for reconstructing the underlying energy landscape and for deducing the folding pathway (or pathways) from it.
In principle, energy landscapes are characterized by their local minima, which correspond to locally stable conformations, and by the transition regions (barriers) that connect the minima. In small systems, which have only a few minima, it is possible to use a direct approach to identify all the local minima and thus to describe the entire potential energy surface. Such is the case for small reactive systems [9] and for the alanine dipeptide, which has only two significant degrees of freedom [50,51]. The direct approach becomes impractical, however, for larger systems with many degrees of freedom that are characterized by a multitude of local minima.
A useful procedure for characterizing the multiminimum energy landscape of large systems was introduced by Stillinger and Weber [52]. These researchers investigated the energy landscape of water by quenching (i.e., minimizing) configurations from a molecular dynamics trajectory down to their nearest local minima. Using this procedure a sample of the local minima accessible at a given temperature was obtained, providing a ‘‘map’’ of the underlying landscape. Following the original work this procedure was applied to a variety of systems, including water [52], rare gas clusters [53], and proteins such as myoglobin [54] and bovine pancreatic tripsin inhibitor (BPTI) [55]. The protein studies showed that there are a very large number of local minima in the vicinity of the native state of the protein. Furthermore, the local minima are kinetically clustered into subsets, within which they tend to be connected by low barriers.
Atomic level studies of complex peptide and protein energy landscapes have become more detailed as computers have become faster, allowing for longer sampling simulations
384 |
Becker |
and more complicated analysis. A problem that is faced by protein energy landscape cartographers is that of how to represent the resulting conformation sample in a meaningful way that will allow visualization and analysis of the underlying landscape. As far as the folding process is concerned, good results have been obtained by using one (or a few) effective reaction coordinates such as similarity to the native state (Q) or radius of gyration (Rg) [70,71]. These, however, are not very useful in exploring the energy landscape near the native state of a large protein or of peptides. Instead, to reduce the dimensionality of the data and to allow easier analysis of the landscape, it is becoming increasingly popular to use principal component analysis (PCA) (see Chapter 4) for this purpose. PCA is used to project the high-dimensional conformation sample onto a low-dimensional subspace that best represents it. The combination of PCA with long-time molecular dynamics has led to detailed studies of the energy landscape of proteins such as lysozyme [56], CRP:(cAMP)2 [57], cytochrome c [58], and crambin [59]. In all, these systems exhibit complex landscapes with multiple basins. The observed dynamics on these landscapes typically involve long periods of motion within a basin followed by fast transition from one basin to another. These observations led Go and collaborators to suggest a ‘‘jumping among minima’’ (JAM) model to help analyze the simulation results [60].
Combining the PCA projection with an energy scale allows for 3D visualization of the underlying landscape. It should be noted, however, that without specific information on the barriers such PCA representations of the landscapes will at most reflect their overall shape, limited by the quality of the projection, and not necessarily their details. Nonetheless, the lack of information on the barriers is somewhat compensated for by the presence of ‘‘empty spaces,’’ which correspond to poorly sampled regions associated with high energy [59,61]. A problem associated with generating three-dimensional PCA views of protein energy landscapes is that the other principal coordinates, which are not included in this view, will manifest themselves as ‘‘noise’’ or ‘‘roughness’’ in the low-dimensional representation. This is because each point in the plain defined by the two main principal coordinates {Q1, Q2} is associated with many conformations of different energies, separated from each other in the other principal coordinates {Q3, Q4, . . .}. When the number of sampling points is small this problem can be overcome by a simple smoothing procedure, such as that used in mapping the energy landscape of alanine tetrapeptide [62]. However, when many conformations are included in the conformation sample, the ‘‘minimum energy envelope’’ procedure can be used to reduce the roughness [61]. For each value (on a grid) of the two main principal coordinates {Q1, Q2} this procedure chooses the lowest conformation energy among all conformations that project onto this 2D grid point. The resulting smooth landscape is equivalent to an adiabatic surface, a surface that has been minimized in all coordinates other than Q1 and Q2. The resulting 3D view offers a direct visualization of the main basins on the energy landscape. Figure 4 shows the energy landscape of the prion protein (PrP) (residues 124–226) in vacuum [63]. Two large basins are clearly seen. The first is a deep but narrow basin associated with the native PrPc conformation [7]. The second basin, which is shallower but wider, is associated with a second group of conformations of a partially unfolded protein. These offer a framework for studying the kinetics of protein folding.
Clearly, mapping energy landscapes based only on local minima gives only a partial description of the energy landscape, because the maps do not contain information about the energy barriers that govern the system’s kinetics. It is the knowledge of the transition states that allows a detailed exploration of kinetics through the use of the master equation approach [Eqs. (2)–(4)]. One of the first detailed studies of this sort was performed by
Protein Folding: Computational Approaches |
385 |
Figure 4 The energy landscape of the prion protein (PrP) (residues 124–226) in vacuum, obtained by principal coordinate analysis followed by the minimal energy envelope procedure. Two large basins are seen. One basin is associated with the native PrPc conformation; the other is associated with partially unfolded conformations.
Czerminski and Elber [64], who generated an almost complete map of the minima and barriers of an alanine tetrapeptide in vacuum. Using the master equation approach they were able to study aspects of this system’s kinetics, which involve the crossing of barriers of different heights.
Obtaining information regarding barriers, which accounts for state-to-state transition states, is a complicated computational task (see Chapter 10). However, even if such data are obtained, their complexity renders it difficult to introduce barrier information into the description of the atomistic energy landscape. In particular, one would like to extract from the raw data information regarding the overall connectivity of the landscape as well as information regarding the global basin-to-basin kinetic transitions. It is the transition from the ensemble of unfolded conformations (‘‘unfolded basin’’) to the ensemble of folded conformations (‘‘folded basin’’) that is of interest, rather than individual transitions between specific conformations. This type of ‘‘global’’ kinetics is in line with the type of observations available experimentally. To address this problem the method of ‘‘topological mapping’’ was introduced by Becker and Karplus [65]. Based on barrier information this method partitions conformation space into its component energy basins, thus highlighting the overall basin-to-basin connectivity of the landscape. At any energy level E the molecular conformation space can be partitioned into disconnected regions consisting of local minima that are connected by barriers lower than E. The method of topological mapping follows the way these disconnected regions, or ‘‘basins,’’ connect and disconnect as a function of increasing and decreasing energy E. An elementary basin R(α) is defined
386 |
Becker |
as a connected set of molecular conformations that, when minimized, map to the same single local minimum. Topological mapping groups these elementary basins according to the barriers between them. At any energy level E (or temperature level T ) the multidimensional landscape is thus partitioned into ‘‘superbasins,’’ R E (α′), defined as the union of elementary basins R(α) connected by barriers lower than energy E (or T ).
R E (α′) R(α) |
(8) |
Each such ‘‘superbasin’’ is then mapped to its lowest minimum α′ in a way that is analogous to simulated annealing (Fig. 5a). As a result, minima connected by barriers lower than E are grouped together and separated from other minima to which they are connected by higher barriers. A topological ‘‘disconnectivity’’ graph is obtained by following the way these superbasins break up as the system’s energy E decreases. Each node on this graph (Fig. 5b) reflects a conformational superbasin on the landscape, and the connecting edges reflect the basin connectivity. The node at the top of the tree-graph corresponds to the ergodic limit, in which all states are connected. As the energy is decreased the graph splits to indicate basins that are becoming disconnected at that energy level. The topological mapping method resembles the Lid method independently developed by Sibani et al. [66] to study the energy landscape of crystals and glasses.
An advantage of topological mapping is that the resulting disconnectivity tree graph reflects, in a straightforward way, the overall topography of the energy landscape. For example, a tree graph reflecting ‘‘funnel’’ topography would be characterized by a single main branch with many small side branches that do not undergo additional splitting. On the other hand, a tree graph that corresponds to a landscape characterized by several large competing basins will exhibit several large branches, each displaying a complex branching pattern of its own. In the case of a completely rough landscape, no dominant branch can be detected in the disconnectivity graph. Application of this analysis method to the energy
Figure 5 A schematic representation of a ‘‘topological mapping’’ of an energy landscape. (a) The energy landscape is studied at different energies E. Each region of connected conformations, denoted as a ‘‘superbasin’’ R E(α′), is mapped to its lowest minima α′. (b) The corresponding topological ‘‘disconnectivity’’ tree graph reflects the way superbasins become disconnected as the energy is decreased.
Protein Folding: Computational Approaches |
387 |
landscape of alanine tetrapeptide, based on the data of Czerminski and Elber [64], showed that this all-atom energy landscape is dominated by a ‘‘funnel’’ topography although the presence of a large kinetic trap could also be detected [65]. The insight into the connectivity of this landscape was used to study the overall basin-to-basin kinetics of this tetrapeptide, employing the master equation approach [65]. A very clear funnel topography is also seen in the disconnectivity graph of linear alanine hexapeptide (Ala)6 shown in Figure 6 [67]. The method of topological mapping was successfully employed to characterize the energy landscape of different types of atomic and molecular clusters [68].
A different approach for handling barrier information was suggested by Kunz and Berry [69]. In this method conformations are sampled along high temperature dynamical trajectories, with the connectivity, including saddle points, determined for successive coordinate sets along a given trajectory. The minima–barrier–minima triplets are then put together in a way that follows the descent from high energy conformations to low energy structures. This results in linear cross sections through the high-dimensional energy landscape. Applying this method to different types of clusters led to the distinction between ‘‘structure-seeking’’clusters, such as the (KCl)32 cluster, that exhibit a steep staircase-like
Figure 6 The topological disconnectivity graph of alanine hexapeptide. (Adapted from Ref. 67.)