- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
6
Internal Coordinate Simulation Method
Alexey K. Mazur
Institut de Biologie Physico-Chimique, CNRS, Paris, France
I.INTRODUCTION
In this chapter I outline the general principles of modeling biomacromolecules with internal coordinates as independent variables. This approach was generally preferred in the early period of computer conformational analysis when hardware computer resources were strongly limited [1]. In the last two decades, mainly because of the growing interest in molecular dynamics (MD), Cartesian coordinate approaches gradually became predominant, and one readily sees that just by looking into the index of this book. Nevertheless, internal coordinates continue to be employed, notably, in conformational searches based on energy minimization and Monte Carlo (MC) [2] and in normal mode analysis [3]. My main objective is to give a consistent exposition of the basic algorithms of this methodology and its underlying philosophy, with special emphasis on recent advances in the internal coordinate molecular dynamics (ICMD) techniques.
More traditional applications of internal coordinates, notably normal mode analysis and MC calculations, are considered elsewhere in this book. In the recent literature there are excellent discussions of specific applications of internal coordinates, notably in studies of protein folding [4] and energy minimization of nucleic acids [5].
II. INTERNAL AND CARTESIAN COORDINATES
The term ‘‘internal coordinates’’ usually refers to bond lengths, valence angles and dihedrals. They completely define relative atomic positions thus giving an alternative to the Cartesian coordinate description of molecular structures. Dihedrals corresponding to rotations around single bonds are most important because all other internal coordinates are usually considered fixed at their standard values, and the representation thus obtained is referred to as the standard geometry approximation [6]. For both proteins and nucleic acids the standard geometry approximation reduces the number of degrees of freedom from 3N to approximately 0.4N, where N is the total number of atoms. Freezing of ‘unimportant’ variables accelerates minimization of the potential energy as well as equilibration in Monte Carlo calculations just because the space dimension is the principal parameter that determines the theoretical rate of convergence of iterative algorithms. It is important
115
116 |
Mazur |
also that higher order minimizers that require much computer memory to store the Hessian matrix remain affordable even for very large systems. It should be noted, however, that because of the non-linear relationship between internal and Cartesian coordinates the distinction between them is not reduced to the foregoing simple arithmetic. To begin with, let us consider the following instructive example.
Figure 1 compares the courses of energy minimization with different choices of coordinates. A standard geometry initial conformation was minimized in three modes: (1) with all degrees of freedom and Cartesian coordinates as variables, (2) with all degrees of freedom but internal coordinates as variables, and (3) with fixed standard geometry. All computations were made with the same program code employing a conjugate gradient minimizer with analytical gradients. Figure 1a demonstrates that, as expected, the minimum is most rapidly found with the standard geometry approximation. With all degrees
Figure 1 The course of energy minimization of a DNA duplex with different choices of coordinates. The rate of convergence is monitored by the decrease of the RMSD from the final local minimum structure, which was very similar in all three cases, with the number of gradient calls. The RMSD was normalized by its initial value. CC, IC, and SG stand for Cartesian coordinates, 3N internal coordinates, and standard geometry, respectively.
Internal Coordinate Simulation |
117 |
of freedom the structure changes much more slowly, but we note that the rate of convergence is noticeably higher when internal rather than Cartesian coordinates are used, even though the space dimension is 3N in both cases. The internal coordinate minimization goes faster because internal coordinates better correspond to the local potential energy landscape. The energy gradient is an invariant vector that does not depend on the choice of coordinates, and so is the direction computed by the minimizer. Once it is chosen, however, the minimizer moves the structure along a straight line in the corresponding space. In Cartesian coordinates the profiles of the potential energy are very complex, and any straight path quickly goes to a wall. In contrast, curved atomic trajectories corresponding to straight lines in the internal coordinate space make possible much longer moves.
A clear manifestation of the foregoing effect is exhibited in Figure 1b. This graph shows results of a similar minimization test but with additional harmonic restraints that pulled atomic Cartesian coordinates to the final minimum energy values. Now the potential energy landscape in Cartesian coordinate space is greatly simplified, giving a dramatic acceleration of convergence compared to internal coordinates. As a result, convergence appears even faster than with the standard geometry approximation in spite of the difference in the number of variables. In practice, regardless of the number of variables and the type of minimizer, internal coordinates are always preferable in unconstrained minimization. In contrast, for example, in crystallographic root-mean-square refinement with a high weight of experimental restraints Cartesian coordinates should give faster convergence and lower final R factors.
The local energy minimization is arguably the clearest domain in molecular modeling, but we see that even here the difference between the two coordinate sets is far from trivial. It becomes much more complicated, however, when the specific features of macromolecular systems are considered. One feature is the multiple minima problem often discussed in connection with protein folding [2]. It is usually tackled with hybrid MC and MD techniques such as simulated annealing or MC minimization. Common examples are the protein folding by global minimization of some target function (not necessarily energy) and structure determination based on experimental data. In these calculations, called conformational searches, one looks for the structures that satisfy certain conditions and does not care how well the intermediate steps correspond to the physical reality. The standard geometry approximation offers a whole list of specific advantages for such studies.
First, larger MC steps are possible due to the same effect as in the foregoing minimization example. Second, larger MD steps are possible because freezing of bond length and bond angles eliminates the fastest motions. Third, molecular models can tolerate strong stimulation, such as by elevated temperature and strong stochastic forces, and still maintain a correct geometry of chemical groups. In addition, freezing of bond length and bond angles removes the small-scale ‘‘roughness’’ from the energy landscape of a macromolecule, thus vastly reducing the density of insignificant local minima. Exact evaluation of such density is a difficult task, but nevertheless this intuitive suggestion agrees with many practical observations. For example, in terms of root-mean-square distance (RMSD) of atomic coordinates, the standard geometry approximation results in a significantly larger radius of convergence for energy minimization from random states [7]. A similar effect has been reported for simulated annealing of protein conformations in crystallographic refinement [8].
At present, conformational searches provide for the most important application of computer molecular modeling in biology. In contrast, in statistical physics, from which MC and MD methods were originally borrowed, they are primarily used for studying
118 |
Mazur |
physical phenomena connected with thermal molecular motions. In such investigations exhaustive sampling is indispensable. In simple words this means that if an event is considered, it must occur many times in MC or MD trajectories, and if a parameter is measured, every state that contributes a distinct individual value to the average must be visited many times. Unfortunately, with the presently available computer power, hardly any biologically important event and hardly any system can be both correctly and accurately modeled in such a sense. Nevertheless, this line of research has many long-term prospects in molecular biophysics, and in the remaining part of this section I will briefly comment on the problems connected with the application of internal coordinates in such studies.
In ‘‘true simulations’’ physical realism is the goal, and the question arises, What part of such realism is sacrificed with the elimination of ‘‘unimportant’’ degrees of freedom? This issue appears to be rather complicated. It has been debated many times in the literature, but no consensus seems to have been reached [6,9–16]. Without going into details, I briefly summarize here the two opposite lines of argumentation, denoting them
(A) and (B).
(A1) Freezing of bonds and angles deforms the phase space of the molecule and perturbs the time averages. The MD results, therefore, require a complicated correction with the so-called metric tensor, which undermines any gain in efficiency due to elimination of variables [10,17–20].
(B1) The metrics effect is very significant in special theoretical examples, like a freely joined chain. In simulations of polymer solutions of alkanes, however, it only slightly affects the static ensemble properties even at high temperatures [21]. Its possible role in common biological applications of MD has not yet been studied. With the recently developed fast recursive algorithms for computing the metric tensor [22], such corrections became affordable, and comparative calculations will probably appear in the near future.
(B2) With their frequencies beyond 1000 cm 1, the bond length and bond angle oscillations occupy the ground state at room temperature. The classical harmonic treatment makes them ‘‘too flexible.’’
(A2) In spite of the high individual frequencies, bond length and bond angle vibrations participate in quasi-classical low frequency collective normal modes. Bond angle bending is necessary for the flexibility of five-membered rings, which plays a key role in the polymorphism of nucleic acids.
(B2) Usually, the role of these vibrations is not crucial, and with bond lengths and bond angles fixed the corresponding collective modes are only modified, not eliminated. Significant variations of valence angles in strained structures, as in furanose rings of nucleic acids, can be treated with special algorithms.
(A3) Bond lengths and bond angles vary in protein crystal structures.
(B3) These variations are related to the refinement procedures much more than to the experimental data [23] and are generally larger than in high resolution structures of small molecules. In MD calculations with harmonic bond lengths and bond angles they are still higher.
(A4) Bond angle bending makes a nonnegligible contribution to conformational entropy and can affect computed equilibrium populations [11].
(B4) The corresponding estimates are valid only in harmonic approximation; therefore, they are inapplicable to normal temperature conditions. The harmonic