Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Young - Computational chemistry

.pdf
Скачиваний:
77
Добавлен:
08.01.2014
Размер:
4.23 Mб
Скачать

246 30 STRUCTURE±PROPERTY RELATIONSHIPS

TABLE 30.1 (Continued)

Quantum Chemical Descriptors

Sum of the squared atomic charge densities

Sum of the absolute values of charges

Absolute hardness

Statistical Mechanical Descriptors

Vibrational frequencies

Rotational enthalpy and entropy

Vibrational enthalpy and entropy

Translational enthalpy and entropy

The process described in the preceding paragraphs has seen widespread use. This is partly because it has been automated very well in the more sophisticated QSPR programs.

It is possible to use nonlinear curve ®tting (i.e., exponents of best ®t). Nonlinear ®tting is done by using a steepest-descent algorithm to minimize the deviation between the ®tted and correct values. The drawback is possibly falling into a local minima, thus necessitating the use of global optimization algorithms. Automated algorithms for determining which descriptors to include in a nonlinear ®t are possible, but there is not yet a consensus as to what technique is best. This approach can yield a closer ®t to the data than multiple linear techniques. However, it is less often used due to the large amount of manual trial-and-error work necessary. Automated nonlinear ®tting algorithms are expected to be included in future versions of QSPR software packages.

The validation of the prediction equation is its performance in predicting properties of molecules that were not included in the parameterization set. Equations that do well on the parameterization set may perform poorly for other molecules for several di¨erent reasons. One mistake is using a limited selection of molecules in the parameterization set. For example, an equation parameterized with organic molecules may perform very poorly when predicting the properties of inorganic molecules. Another mistake is having nearly as many ®tted parameters as molecules in the test set, thus ®tting to anomalies in the data rather than physical trends.

The development of group additivity methods is very similar to the development of a QSPR method. Group additivity methods can be useful for properties that are additive by nature, such as the molecular volume. For most properties, QSPR is superior to group additivity techniques.

Other algorithms for predicting properties have been developed. Both neural network and genetic algorithm-based programs are available. Some arguments can be made for the use of each. However, none has yet seen widespread use. This may be partially due to the greater di½culty in interpreting the chemical information that can be gained in addition to numerical predictions. Neural

30.3 3D QSAR

247

networks are generally known to provide a good interpolation of data, but rather poor extrapolation.

30.2QSAR

QSAR is also called traditional QSAR or Hansch QSAR to distinguish it from the 3D QSAR method described below. This is the application of the technique described above to biological activities, such as environmental toxicology or drug activity. The discussion above is applicable but a number of other caveats apply; which are addressed in this section. The following discussion is oriented toward drug design, although the same points may be applicable to other areas of research as well.

In order to parameterize a QSAR equation, a quanti®ed activity for a set of compounds must be known. These are called lead compounds, at least in the pharmaceutical industry. Typically, test results are available for only a small number of compounds. Because of this, it can be di½cult to choose a number of descriptors that will give useful results without ®tting to anomalies in the test set. Three to ®ve lead compounds per descriptor in the QSAR equation are normally considered an adequate number. If two descriptors are nearly collinear with one another, then one should be omitted even though it may have a large correlation coe½cient.

In the case of drug design, it may be desirable to use parabolic functions in place of linear functions. The descriptor for an ideal drug candidate often has an optimum value. Drug activity will decrease when the value is either larger or smaller than optimum. This functional form is described by a parabola, not a linear relationship.

The advantage of using QSAR over other modeling techniques is that it takes into account the full complexity of the biological system without requiring any information about the binding site. The disadvantage is that the method will not distinguish between the contribution of binding and transport properties in determining drug activity. QSAR is very useful for determining general criteria for activity, but it does not readily yield detailed structural predictions.

30.33D QSAR

For drug design purposes, it is desirable to construct a method that will predict the molecular structures of candidate compounds without requiring knowledge of the binding-site geometry. 3D QSAR has been fairly successful in ful®lling these criteria. It is similar to QSAR in that property descriptors, statistical analysis, and ®tting techniques are used. Beyond that, the two computations are signi®cantly di¨erent.

Like QSAR, molecular structures must be available for compounds that

248 30 STRUCTURE±PROPERTY RELATIONSHIPS

have known quantitatively de®ned activities. The ®rst step is then to align the molecular structures. This alignment is based on the fact that all have a drug activity due to docking at a particular site. Alignment algorithms rotate and translate a molecule within the Cartesian coordinate space until it matches the location and rotation of another molecule as well as possible. This can be as simple as aligning the backbones of similar molecules or as complex as a sophisticated search and optimization scheme. For conformationally ¯exible compounds, both alignment and conformation must be addressed. Typically, the most rigid molecule in the set is the one to which the others are aligned. There are automated routines for ®nding the conformer of best alignment, or this can be done manually.

Once the molecules are aligned, a molecular ®eld is computed on a grid of points in space around the molecule. This ®eld must provide a description of how each molecule will tend to bind in the active site. Field descriptors typically consist of a sum of one or more spatial properties, such as steric factors, van der Waals parameters, or the electrostatic potential. The choice of grid points will also a¨ect the quality of the ®nal results.

The ®eld points must then be ®tted to predict the activity. There are generally far more ®eld points than known compound activities to be ®tted. The least-squares algorithms used in QSAR studies do not function for such an underdetermined system. A partial least squares (PLS) algorithm is used for this type of ®tting. This method starts with matrices of ®eld data and activity data. These matrices are then used to derive two new matrices containing a description of the system and the residual noise in the data. Earlier studies used a similar technique, called principal component analysis (PCA). PLS is generally considered to be superior.

The model obtained from the PLS algorithm gives two pieces of information on various regions of space. The ®rst is how well the activity correlates to that region in space. The second is whether the functional group at that point should be electron-donating, electron-withdrawing, bulky, and so forth according to the choice of ®eld parameters. This site description is called a pharmacophore in drug design work.

An examination of the plotted data reveals signi®cant structural information, such as the fact that an electron-donating group should be a certain distance from a withdrawing group, and so on. Further examination of relative magnitudes can give an indication as to precisely which group might be best. Unknown compounds may then be run through the same analysis to obtain a quantitative prediction of their drug activities.

Ideally, the results should be validated somehow. One of the best methods for doing this is to make predictions for compounds known to be active that were not included in the training set. It is also desirable to eliminate compounds that are statistical outliers in the training set. Unfortunately, some studies, such as drug activity prediction, may not have enough known active compounds to make this step feasible. In this case, the estimated error in prediction should be increased accordingly.

BIBLIOGRAPHY 249

30.4COMPARATIVE QSAR

Comparative QSAR is a ®eld currently under development by several groups. Large databases of known QSAR and 3D QSAR results have been compiled. Such a database can be used for more than simply obtaining literature citations. The analysis of multiple results for the same or similar systems can yield a general understanding of the related chemistry as well as providing a good comparison of techniques.

30.5RECOMMENDATIONS

Floppy molecules present some additional di½culty in applying QSAR/QSPR. They are also much more di½cult to work with in 3D QSAR. With QSAR/ QSPR, this problem can be avoided by using only descriptors that do not depend on the conformation, but the accuracy of results may su¨er. For more accurate QSPR, the lowest-energy conformation is usually what should be used. For QSAR or 3D QSAR, the conformation most closely matching a rigid molecule in the test set should be used. If all the molecules are ¯oppy, ®nding the lowest-energy conformer for all and looking for some commonality in the majority might be the best option.

QSPR and QSAR are useful techniques for predicting properties that would be very di½cult to predict by any other method. This is a somewhat empirical or indirect calculation that ultimately limits the accuracy and amount of information which can be obtained. When other means of computational prediction are not available, these techniques are recommended for use. There are a variety of algorithms in use that are not equivalent. An examination of published results and tests of several techniques are recommended.

BIBLIOGRAPHY

Introductory descriptions are in

A.K. RappeÂ, C. J. Casewit, Molecular Mechanics across Chemistry University Science Books, Sausalito (1997).

A.R. Leach Molecular Modelling Principles and Applications Longman, Essex (1996).

G.H. Grant, W. G. Richards, Computational Chemistry Oxford, Oxford (1995).

Books about QSAR/QSPR are

L.B. Kier, L. H. Hall, Molecular Structure Description: The Electrotopological State

Academic Press, San Diego (1999).

Topological Indices and Related Descriptors in QSAR and QSPR J. Devillers, A. T. Balaban, Eds., Gordon and Breach, Reading (1999).

3D QSAR in Drug Design H. Kubinyi, Y. C. Martin, G. Folker, Eds., Kluwer, Norwell MA (1998). (3 volumes)

250 30 STRUCTURE±PROPERTY RELATIONSHIPS

J.Devillers, Neural Networks in QSAR and Drug Design Academic Press, San Diego (1996).

C. Hansch, A. Leo, Exploring QSAR American Chemical Society, Washington (1995). L. B. Kier, L. H. Hall, Molecular Connectivity in Structure-Activity Analysis Research

Studies Press, Chichester (1986).

L. B. Kier, L. H. Hall, Molecular Connectivity in Chemistry and Drug Research Academic Press, San Diego (1976).

Review articles are

D. Ivanciuc, Encycl. Comput. Chem. 1, 167 (1998).

V. Venkatasubramanian, a. Sundaram, Encycl. Comput. Chem. 2, 1115 (1998). G. Jones, Encycl. Comput. Chem. 2, 1127 (1998).

D. Ivanciuc, A. T. Balaban, Encycl. Comput. Chem. 2, 1169 (1998). J. Shorter, Encycl. Comput. Chem. 4, 1487 (1998).

P. C. Jurs, Encycl. Comput. Chem. 1, 2320 (1998).

M. Randic, Encycl. Comput. Chem. 5, 3018 (1998).

S.Profeta, Jr., Kirk-Othmer Encyclopedia of Chemical Technology Supplement J. I. Kroschwitz (Ed.) 315, John Wiley & Sons, New York (1998).

G. A. Arteca, Rev. Comput. Chem. 9, 191 (1996).

M. Karelson, V. S. Lobanov, A. R. Katritzky, Chem. Rev. 96, 1027 (1996).

A.R. Katritzky, V. S. Lobanov, M. Karelson, Chem. Soc. Rev. 24, 279 (1995).

B.W. Clare, Theor. Chim. Acta 87, 415 (1994).

L. H. Hall, L. B. Kerr, Rev. Comput. Chem. 2, 367 (1991).

I. B. Bersuker, A. S. Dimoglo, Rev. Comput. Chem. 2, 423 (1991). S. P. Gupta, Chem. Rev. 87, 1183 (1987).

3D QSAR reviews are

H. Kubinyi, Encycl. Comput. Chem. 1, 448 (1998).

T. I. Oprea, C. L. Waller, Rev. Comput. Chem. 11, 127 (1997).

G.Greco, E. Novellino, Y. C. Martin, Rev. Comput. Chem. 11, 183 (1997).

Comparative QSAR reviews are

H.Gao, J. A. Katzenellenbogen, R. Garg, C. Hansch, Chem. Rev. 99, 723 (1999).

C.Hansch, G. Gao, Chem. Rev. 97, 2995 (1997).

C.Hansch, D. Hoekmen, H. Gao, Chem. Rev. 96, 1045 (1996).

Many resources are listed at the web site of The QSAR and Modelling Society

http://www.pharma.ethz.ch/qsar

QSAR applications in various ®elds

J. Devillers, Encycl. Comput. Chem. 2, 930 (1998).

H. Kubinyi, Encycl. Comput. Chem. 4, 2309 (1998).

BIBLIOGRAPHY 251

F. Leclerc, R. Cedergren, Encycl. Comput. Chem. 4, 2756 (1998).

QSAR in Environmental Toxicology-IV Elsevier, Amsterdam (1991).

Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental Chemistry and Toxicology W. Karcher, J. Devillers, Eds., Kluwer, Dordrecht (1990).

QSAR in Environmental Toxicology K. L. E. Kaiser, Ed., D. Reidel Publishing, Dordrecht (1989).

QSAR in Environmental Toxicology-II D. Reidel Publishing, Dordrecht (1987).

QSAR in Drug Design and Toxicology D. Hadzi, B. Jerman-BlazÏicÏ, Eds., Elsevier, Amsterdam (1987).

QSAR and Strategies in the Design of Bioactive Compounds J. K. Seydel, Ed., VCH, Weinheim (1985).

An article listing many descriptors is

M. Cocchi, M. C. Menziani, F. Fanelli, P. G. de Benedetti, J. Mol. Struct. (Theochem) 331, 79 (1995).

Computational Chemistry: A Practical Guide for Applying Techniques to Real-World Problems. David C. Young Copyright ( 2001 John Wiley & Sons, Inc.

ISBNs: 0-471-33368-9 (Hardback); 0-471-22065-5 (Electronic)

Computing NMR Chemical

31 Shifts

Nuclear magnetic resonance (NMR) spectroscopy is a valuable technique for obtaining chemical information. This is because the spectra are very sensitive to changes in the molecular structure. This same sensitivity makes NMR a di½cult case for molecular modeling.

Computationally predicting coupling constants is much easier than predicting chemical shifts. Because of this, the ability to predict coupling constants is sometimes incorporated into software packages that have little or no ability to predict chemical shifts. Computed coupling constants di¨er very little from one program to the next. This chapter will focus on the more di½cult problem of computing NMR chemical shifts.

31.1AB INITIO METHODS

NMR chemical shifts can be computed using ab initio methods, which actually compute the shielding tensor. Once the shielding tensors have been computed, the chemical shifts can be determined by subtracting the isotropic shielding values for the molecule of interest from the TMS values. Computing shielding tensors is di½cult because of gauge problems (dependence on the coordinate system's origin). A number of techniques for correcting this are in use. It is extremely important that the shielding tensors be computed for equilibrium geometries with the same method and basis that were used to complete the geometry optimization.

It is also important that su½ciently large basis sets are used. The 6ÿ31G(d) basis set should be considered the absolute minimum for reliable results. Some studies have used locally dense basis sets, which have a larger basis on the atom of interest and a smaller basis on the other atoms. In general, this results in only minimal improvement since the spectra are due to interaction between atoms, rather than the electron density around one atom.

One of the most popular techniques is called GIAO. This originally stood for gauge invariant atomic orbitals. More recent versions have included ways to relax this condition without loss of accuracy and subsequently the same acronym was renamed gauge including atomic orbitals. The GIAO method is based on perturbation theory. This is a means for computing shielding tensors from HF or DFT wave functions.

The individual gauge for localized orbitals (IGLO) and localized orbital

252

31.3 EMPIRICAL METHODS 253

local origin (LORG) methods are similar. Both are based on identities and closure relations that are rigorously correct for complete basis sets. These are reasonable approximations for ®nite basis sets. The two methods are equivalent in the limit of a complete basis set.

The individual gauges for atoms in molecules (IGAIM) method is based on Bader's atoms in molecules analysis scheme. This method yields results of comparable accuracy to those of the other methods. However, this technique is seldom used due to large CPU time demands.

There have also been methods designed for use with perturbation theory and MCSCF calculations. Correlation e¨ects are necessary for certain technically di½cult molecules, such as CO, N2, HCN, F2, and N2O.

Density functional theory calculations have shown promise in recent studies. Gradient-corrected or hybrid functionals must be used. Usually, it is necessary to employ a moderately large basis set with polarization and di¨use functions along with these functionals.

The methods listed thus far can be used for the reliable prediction of NMR chemical shifts for small organic compounds in the gas phase, which are often reasonably close to the liquid-phase results. Heavy elements, such as transition metals and lanthanides, present a much more di½cult problem. Mass defect and spin-coupling terms have been found to be signi®cant for the description of the NMR shielding tensors for these elements. Since NMR is a nuclear e¨ect, core potentials should not be used.

31.2SEMIEMPIRICAL METHODS

There is one semiempirical program, called HyperNMR, that computes NMR chemical shifts. This program goes one step further than other semiempiricals by de®ning di¨erent parameters for the various hybridizations, such as sp2 carbon vs. sp3 carbon. This method is called the typed neglect of di¨erential overlap method (TNDO/1 and TNDO/2). As with any semiempirical method, the results are better for species with functional groups similar to those in the set of molecules used to parameterize the method.

Another semiempirical method, incorporated in the VAMP program, combines a semiempirical calculation with a neural network for predicting the chemical shifts. Semiempirical calculations are useful for large molecules, but are not generally as accurate as ab initio calculations.

31.3EMPIRICAL METHODS

The simplest empirical calculations use a group additivity method. These calculations can be performed very quickly on small desktop computers. They are most accurate for a small organic molecule with common functional groups. The prediction is only as good as the aspects of molecular structure being par-

254 31 COMPUTING NMR CHEMICAL SHIFTS

ameterized. For example, they often do not distinguish between cis and trans isomers. Due to the limited accuracy, this method is more often used as a tool to check for reasonable results, but not as a rigorous prediction method.

Another technique employs a database search. The calculation starts with a molecular structure and searches a database of known spectra to ®nd those with the most similar molecular structure. The known spectra are then used to derive parameters for inclusion in a group additivity calculation. This can be a fairly sophisticated technique incorporating weight factors to account for how closely the known molecule conforms to typical values for the component functional groups. The use of a large database of compounds can make this a very accurate technique. It also ensures that liquid, rather than gas-phase, spectra are being predicted.

31.4RECOMMENDATIONS

In general, the computation of absolute chemical shifts is a very di½cult task. Computing shifts relative to a standard, such as TMS, can be done more accurately. With some of the more approximate methods, it is sometimes more reliable to compare the shifts relative to the other shifts in the compound, rather than relative to a standard compound. It is always advisable to verify at least one representative compound against the experimental spectra when choosing a method. The following rules of thumb can be drawn from a review of the literature:

1.Database techniques are very fast and very accurate for organic molecules with common functional groups.

2.Ab initio methods are accurate and can be reliably applied to unusual structures and inorganic compounds. In most cases, HF calculations are fairly good for organic molecules. Large basis sets should be used.

3.For large molecules, the choice between semiempirical calculations and empirical calculations should be based on a test case.

4.Correlated and relativistic quantum mechanical calculations give the highest possible accuracy and are necessary for heavy atoms or correla- tion-sensitive systems.

BIBLIOGRAPHY

Introductory descriptions are in

M. F. Schlecht, Molecular Modeling on the PC Wiley-VCH, New York (1998). E. K. Wilson, Chem. & Eng. News Sept. 28 (1998).

P.W. Atkins, R. S. Friedman, Molecular Quantum Mechanics Third Edition Oxford, Oxford (1997).

BIBLIOGRAPHY 255

M. Karplus, R. N. Porter, Atoms & Molecules: An Introduction For Students of Physical Chemistry W. A. Benjamin, Inc., Menlo Park (1970).

Books about NMR modeling are

B.Born, H. W. Spiess, Ab Initio Calculations of Conformational E¨ects on 13C NMR Spectra of Amorphous Polymers Springer-Verlag, New York (1997).

I. Ando, G. A. Webb, Theory of NMR Parameters Academic Press, London (1983).

The following book gives a tutorial and examples for using ab initio methods. Some printings have an error in the listed TMS values. An eratta is available from Gaussian, Inc.

J.B. Foresman, á. Frisch, Exploring Chemistry with Electronic Structure Methods Second Edition Gaussian, Pittsburgh (1996).

Review articles are

U. Fleischer, C. van WuÈllen, w. Kutzelnigg, Encycl. Comput. Chem. 3, 1827 (1998). M. BuÈhl, Encycl. Comput. Chem. 3, 1835 (1998).

M. Kaupp, V. G. Malkin, O. L. Malkina, Encycl. Comput. Chem. 3, 1857 (1998).

C.J. Jameson, Annu. Rev. Phys. Chem. 47, 135 (1996).

D.B. Chesnut, Rev. Comput. Chem. 8, 245 (1996).

J.R. Cheeseman, G. W. Trucks, T. A. Keith, M. J. Frisch, J. Chem. Phys. 104, 5497 (1996).

D. B. Chesnut, Annual Reports on NMR Spectroscopy 29, 71 (1994).

C.J. Jameson, Chem. Rev. 91, 1375 (1991).

D.B. Chesnut, Annual Reports on NMR Spectroscopy 21, 51 (1989).

C. Giessner-Prettre, B. Pullman, Quarterly Reviews of Biophysics 20, 113 (1987).

C.J. Jameson, H. J. Osten, Annual Reports on NMR Spectroscopy 17, 1 (1986).

The group additivity technique is presented in

E.Pretsch, J. Seibl, W. Simon, T. Clerc, Tabellen zur StrukturaufklaÈrung Organischer Verbindungen mit Spektroskopischen Methoden Springer-Verlag, Berlin (1981).

Соседние файлы в предмете Химия