Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Matta, Boyd. The quantum theory of atoms in molecules

.pdf
Скачиваний:
113
Добавлен:
08.01.2014
Размер:
11.89 Mб
Скачать

292 11 Topological Analysis of Proteins as Derived from Medium and High-resolution Electron Density

Fig. 11.2 (Top) Dendrogram depicting the results of the hierarchical merging/clustering algorithm applied to the PASA ED distribution of TYR39 of the hAR structure. Results for di erent values of t (0.280, 0.420, and 0.560 A˚ 2) are emphasized using vertical lines.

The corresponding ED peaks are symbolized using open circles. (Bottom) 2D molecular representation of the information contained in the top figure. Fragments corresponding to ED peaks are represented at t ¼ 0:280 (plain lines) and 0.420 (dotted lines) A˚ 2.

11.2 Methodology and Technical Details 293

Fig. 11.3 (Top) Schematic view of the hAR structure with the adenine- binding-site amino acids selected for electrostatic potential calculations explicitly represented. (Bottom) Close view of the adenine binding site. It consists of the ten amino acids of the hAR structure directly surrounding the adenine moiety of the NADPþ cofactor.

294 11 Topological Analysis of Proteins as Derived from Medium and High-resolution Electron Density

All electrostatic computations described in the last part of this chapter will take into account this substructure selection only (including hydrogen atoms), thus neglecting the contribution from all other atoms of the protein model. In accordance with the formal charges content of the adenine binding site, global electroneutrality of the fragment will be ensured in all electrostatic potential computations.

11.3

Topological Properties of Multipolar Electron Density Database

As described above, an experimental database of pseudo atom ED is being completed in Nancy [13]. It currently includes all chemical types of atom involved in protein structures and some nucleic acids.

Topological analysis of the ED of the fragments built from our database has been performed as follows using NewProp [24, 36]. The ED of eleven peptides has been reconstructed at the crystallographic experimental geometry using database values to provide some statistical insight of transferability. The QTAIM atom charges Q and volumes V were then calculated for all 101 atom types in the database. When possible, the values were averaged over similar atoms occurring in the molecules stored in the database. For example, for the peptide moiety HNaHaCa aCbO, C atoms occur 22 times in the database and Ca atoms occur 15 times. Figure 11.4 gives a 2D representation of the atom types (Q versus V). One can see that these two atomic properties can be classified according to their chemical function or/and atomic neighbors. H(X) atoms (where X stands for the bonded heavy atom) can be grouped in three clusters [H(O), H(N)], H(C), and H(S), the largest charge of þ0.8 e corresponding to the smaller volume (less than 1 A˚ 3), depending on the electronegativity of the X atom (or H atom acidity).

The same conclusion can be drawn for the carbon atoms C(O), C(N), or the oxygen O(C), and O(H) atoms with charges as high as þ1.25 e and 1.00 e for C and O, respectively.

The C(C) and C(H) atoms (named C in Fig. 11.4) may have a positive or negative charge between G0.5 e associated with large volume di erences (from 6 to 15 A˚ 3) but no further classification shows up; this is also valid for the N atom but with a less varying charge ( 1.1 e to 1.4 e) associated with a large volume (from 10 to 17 A˚ 3).

The clustering of the topological properties can also be observed by inspection of Table 11.1, which gives the topological characterization of some atom types stored in the database, after transfer to the following moieties:

the peptide plane HNaHaCa aCbO, calculated at the experimental geometry of the tyrosine–glycine peptide bond in Leu–Enkephalin [41, 37] compound.

the aromatic group of the tyrosine amino acid [38].

11.3 Topological Properties of Multipolar Electron Density Database 295

Fig. 11.4 Relationship between net atomic QTAIM charges Q (e) and the atomic basin volume V (A˚ 3) for all the atom types stored in the multipolar ED database.

The corresponding values theoretically obtained by Matta and Bader [39] are also provided for comparison. The authors used, after geometry optimization, an HF procedure (Gaussian94 software [40] with a 6-311þþG** basis set) to calculate the ED of the 24 amino acids in their nonzwitterionic form H2NaCaHa(R)aCOOH. When available, values described in Coppens’ theoretical database [21] for nonhydrogen atoms of the peptide plane moiety are also reported.

The BCP positions in XaH covalent bonds depend on the nature of the X atom, which determines the electron population of the hydrogen atom: hence d2 (H– BCP distance) changes by 41% when going from H(C) to H(O) (0.34 to 0.20 A˚ ) whereas d1 (BCP–X distance) changes by 4% (0.74 to 0.77 A˚ ) only; this is also in proportion to the XaH distance. This competition between X and H atoms does not show up when CaN and CaO bond topological properties are compared – even with large electronegativity di erences, they both belong to the same cluster but d1, d2, or d1=d2 do not suggest the same ðQ; VÞ couple.

Comparison of experimental values with Matta’s theoretical values reveals good agreement for rb, but with an almost systematic trend – the theoretical ED at BCP is approximately 10% larger than the experimental value. This behavior is less pronounced when experimental database rb values are compared with the Coppens’ theoretical values for the peptide plane moiety.

29611 Topological Analysis of Proteins as Derived from Medium and High-resolution Electron Density

Table 11.1 Topological characterization of the electron density at the BCP.[a]

d

d

2

r

2r

l

1

l

2

l

 

1

 

b

 

 

 

3

 

 

 

 

 

 

 

 

Peptide group HNaHaCa aCbO

 

 

 

 

 

 

0.749

0.775

1.70

C12.58

C11.90

C10.88

10.19

Ca xC

0.765

0.765

1.75

17.40

12.5

11.3

 

CaC (saturated)

 

 

 

1.77

12.9

10.9

 

0.640

0.818

1.89

C10.72

C13.28

C12.80

15.36

CxN

0.519

0.913

1.94

20.36

11.40

10.50

 

CaN

 

 

 

1.66

6.8

15.1

 

0.486

0.754

2.83

C28.50

C26.19

C23.33

21.02

CyO

0.397

0.795

2.93

2.41

24.10

22.10

 

CbO

 

 

 

2.78

26.90

19.3

Ca xHa

0.703

0.381

1.67

C14.47

C14.79

C14.52

14.84

0.688

0.396

1.98

26.48

 

 

 

 

 

CaH (saturated)

0.774

0.255

2.03

C25.19

C26.58

C24.86

26.25

NxH

0.745

0.256

2.39

45.11

 

 

 

 

 

NaH

Tyrosine aromatic group aCCara (CHaraHar)4 aOHTyraHOTyr

 

 

0.696

0.697

2.13

C19.58

C16.08

C13.55

10.06

CHarxCHar

0.692

0.694

2.18

24.38

 

 

 

 

 

 

0.692

0.700

2.12

C19.31

C15.91

C13.48

10.08

CCarxCHar

0.686

0.706

2.17

24.19

 

 

 

 

 

 

0.694

0.700

2.14

C19.91

C16.58

C13.55

10.22

CHarxCOar

0.646

0.741

2.20

25.42

 

 

 

 

 

 

0.564

0.796

2.17

C15.09

C16.94

C16.47

18.33

COarxOHTyr

0.435

0.920

1.94

0.24

 

 

 

 

 

 

0.737

0.339

1.77

C18.61

C16.90

C15.78

14.06

CHarxHar

0.682

0.394

1.97

26.23

 

 

 

 

 

 

0.765

0.205

2.05

C25.88

C33.37

C32.36

39.85

OHTyrxHOTyr

0.174

0.773

2.59

68.80

 

 

 

 

 

 

a rb (e A˚ 3) and ‘2r (e A˚ 5) are the ED and its Laplacian at the BCP; l1, l2, l3 (e A˚ 5) are the eigenvalues of the Hessian matrix of r; d1 and d2 (A˚ ) are the distances from the BCP to the first and second atoms defining the bond. The first line corresponds to the multipolar database (bold). When available, the second line gives the results of Matta and Bader [39] (italics). For the peptide group, when present, the third line gives values from Coppens’ theoretical databank [21].

The dependence of ðQ; VÞ values on the nature of the peptide is shown in Table 11.2, which gives the topological properties of the C and O atoms of the peptide CbO group as determined over all peptide molecules stored in the database. Average values and standard deviations are also reported. The C and O atomic multipolar ED data are the same as extracted from the database, irrespective of the type of peptide, and the resulting ðQ; VÞ values only depend on the nature of the side-chain or on its conformation. Because the local CbO geometry is

11.3 Topological Properties of Multipolar Electron Density Database 297

Table 11.2 Atomic QTAIM charge, Q (e), and basin volume, V (A˚ 3), obtained from topological analysis of the multipolar ED database. The basin volumes have been defined by inter atomic boundaries based on zero flux surfaces. The results are only given for the carbonyl CbO atoms in several peptide crystals.

Carbonyl

Carbon

 

 

 

Oxygen

 

 

Molecule[a]

 

 

 

 

 

 

 

Atom

Q

V

Atom

Q

V

 

 

 

 

 

 

 

 

actr

C_1

1.118

6.645

O_1

0.874

16.470

actr

C_2

1.152

6.028

O_2

1.024

18.147

acdelt

C_1

1.155

6.136

O_1

1.012

17.668

acdelt

C_2

1.234

5.679

O_2

1.005

17.950

enk

C_1

1.157

5.647

O_1

1.023

17.680

enk

C_2

1.202

5.531

O_2

1.044

17.731

enk

C_3

1.196

5.783

O_3

1.030

19.317

enk

C_4

1.184

6.261

O_4

1.047

17.051

trig

C_1a

1.215

5.687

O_1a

1.055

16.960

trig

C_2a

1.205

5.609

 

O_2a

1.025

15.959

trig

C_1b

1.214

5.378

 

O_1b

1.030

16.335

trig

C_2b

1.209

5.717

 

O_2b

1.034

17.232

ygg

C_1

1.165

5.390

 

O_1

1.109

18.443

ygg

C_2

1.200

5.466

 

O_2

1.062

16.586

gd

C_1

1.184

6.082

 

O_1

1.048

17.392

actyr

C_1

1.083

6.445

 

O_1

0.992

18.355

gt

C_1

1.221

5.854

 

O_1

1.043

17.376

prohis

C_1

1.173

5.719

 

O_1

1.045

16.832

prohis

C_2

1.257

5.037

 

O_2

1.072

19.927

prohis

C_3

1.198

5.705

 

O_3

1.059

18.919

acgln

C_1

1.104

6.412

 

O_1

1.065

17.962

alamet

C_1

1.167

5.917

 

O_1

1.040

18.235

Average

 

1.182

5.824

 

 

1.035

17.660

RMSD[b]

 

0.041

0.380

 

0.042

0.963

SEM[b]

 

0.009

0.083

 

0.009

0.205

 

 

 

 

 

 

 

 

a enk, Leu–enkephalin [37, 41]; ygg, Tyr–Gly–Gly [38]; gd, Gly–Asp [38]; actr, N-acetyl-l-tryptophan [42]; acdelt, N-acetyl-a,b-dehydro- phenylalanine methylamide [43]; trig, triglycine [44]; actyr, N-acetyl- l-tyrosine ethyl ester monohydrate [45]; gt, glycyl-l-threonine dihydrate [46]; alamet: dl-alanylmethionine [47]; prohis, terbutyl-CO-proline– histidine–NHmethyl [48]; acgln, N-acetyl-l-glutamine [49].

b RMSD, root mean square deviation for the sample (N ¼ 23); SEM, p

¼ RMSD/ ðN 1Þ, standard error of the mean.

298 11 Topological Analysis of Proteins as Derived from Medium and High-resolution Electron Density

identical for all carbonyl groups (bond lengths di er by 0.01 A˚ only), and the interatomic surface of the C atom is limited by N, O, and Ca, the variability of the C atomic basin can originate only from the region above and below the peptide plane, i.e. the side-chains. The O atom is bonded to C only, but its basin is also closed by the intermolecular interactions that occur in the crystal. This could explain the greater variability of V for the O atom than for the C atom, as shown in Table 11.2. In contrast, the QTAIM net charges Q show almost no fluctuations, with QðCÞ ¼ 1:181ð9Þ e, and QðOÞ ¼ 1:035ð9Þ e. In conclusion, the only significant (but small) change is the oxygen atomic basin with V ¼ 17.66 A˚ 3, RMSD ¼ 1 A˚ 3, SEM ¼ 0.2 A˚ 3, because of intermolecular interactions with di erent H O hydrogen bond geometries. The QTAIM charges thus seem to be totally transferable and can be tested as simple point charges in electrostatic property calculations (Section 11.5).

The atomic charges on the peptide group and on the tyrosine aromatic ring for di erent models of the molecular ED are summarized in Table 11.3. Atomic charges presented here are:

average atomic QTAIM charges, denoted QTAIMEXP, obtained by experimental multipolar database ED integration within atomic basins (line 1);

atomic QTAIM charges, denoted QTAIMTHEO, as reported by Matta et al. [39] (line 2); and

atomic charges, denoted QVAL, directly computed from average Pval values stored in the multipolar ED database by using QVAL ¼ N Pval (line 3)

Comparison of the atomic charge values shows that QVAL charges are usually much smaller than QTAIMEXP and QTAIMTHEO charges, especially for nonhydrogen atoms, i.e., when atoms are associated with large atomic basins. One also observes that, even though the CP of all tyrosine CaC covalent bonds are similar (Table 11.1), their QTAIM charges largely di er and enable very good discrimination of CHar, CCar, and COar atoms (the atom names arise from the multipolar database nomenclature and indicate aromatic carbon atoms linked to two carbon and one hydrogen, to a carbon, or to an oxygen atom, respectively). The basin volumes also enable di erentiation of the three types of atom – the more negative the charge, the larger the volume.

11.4

Analysis of Local Maxima in Experimental and Promolecular Medium-resolution Electron Density Distributions

In this part of the chapter results from peak analysis of medium-resolution ED distributions are presented and discussed. Results obtained by use of the so-called promolecular XTAL model are compared with experimental data at the same resolution, i.e. using the observed Fobs. All calculated maps were built according to the hAR crystal structure, including hydrogen and solvent atoms with their

11.4 Local Maxima in Experimental and Promolecular Medium-resolution ED Distributions 299

Table 11.3 Atomic net charges Q (e), basin volume V (A˚ 3) for the atoms in the peptide HNaHaCa aCbO group and in the tyrosine aromatic cycle. The values were obtained by averaging over the n atoms used to build the database.[a]

Peptide group

 

 

 

 

Tyrosine aromatic group

 

 

 

 

 

 

 

 

 

 

 

Atom type

Q

V

n

Atom type

Q

V

n

 

 

 

 

 

 

 

 

C

1.181(9)

5.82(8)

22

CHar

C0.270(6)

14.06(23)

26

 

1.774(6)

4.59(2)

 

 

 

0.019(6)

8.27(8)

 

 

0.024(7)

 

 

 

 

0.155(4)

 

 

O

C1.035(9)

17.66(21)

22

Har

0.244(4)

6.43(15)

26

 

1.35(3)

19.95(7)

 

 

 

0.007(4)

7.27(5)

 

 

0.307(3)

 

 

 

 

0.170(2)

 

 

N

C1.272(9)

14.05(21)

21

CCar

C0.109(11)

9.57(24)

5

 

1.160(4)

16.64(11)

 

 

 

0.005(10)

10.32(6)

 

 

0.312(6)

 

 

 

 

0.040(34)

 

 

H

0.752(23)

1.33(14)

23

 

COar

0.466(4)

8.44(16)

3

 

0.373(4)

4.52(5)

 

 

 

0.521

9.113

 

 

0.320(5)

 

 

 

 

0.053(67)

 

 

Ca

0.135(8)

6.92(4)

15

OHTyr

C1.128(11)

17.97(56)

3

 

0.577(2)

6.11(2)

 

 

 

1.273

18.034

 

 

0.111(10)

 

 

 

 

0.461(28)

 

 

Ha

0.142(2)

6.41(19)

16

HOTyr

0.80(10)

0.96(51)

3

 

0.003(3)

6.86(2)

 

 

 

0.624

2.904

 

 

0.196(5)

 

 

 

 

0.389(18)

 

 

CaGly

C0.028(6)

9.40(18)

12

 

 

 

 

 

0.617

7.305

 

 

 

 

 

 

 

0.224(9)

 

 

 

 

 

 

 

HaGly

0.180(2)

6.28(18)

24

 

 

 

 

 

0.009(0)

6.58(30)

 

 

 

 

 

 

 

0.201(2)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a First line (bold), QTAIM charges (QTAIMEXP ) from the multipolar database ED; second line (italics), QTAIMTHEO charges from Matta and Bader [39]; third line, QVAL charges (multipolar ED database). The estimated standard deviation of the mean is given in parentheses.

refined occupancies. In addition, two experimental 2Fobs –Fcalc maps were considered at 2.85 and 3.5 A˚ resolution.

11.4.1

Experimental and Promolecular Electron Density Distributions Calculated from Structure Factors

Both experimental and a calculated maps were considered for each of the two crystallographic resolution values 2.85 and 3.5 A˚ as selected from Becue et al.

30011 Topological Analysis of Proteins as Derived from Medium and High-resolution Electron Density

[3]. As described in Section 11.2.2, the promolecule maps were built with the software XTAL [25], using the experimental atomic positions and thermal data B, whereas the experimental 2Fobs –Fcalc maps were obtained directly by Fourier transformation using multipolar phases and structure factors moduli. Three of

the four maps under study were characterized by the grid intervals 0.581, 0.566, and 0.571 A˚ , along the unit cell axes a, b, and c, respectively. The experimental map generated at 2.85 A˚ was calculated using grid intervals equal to 0.499, 0.510, and 0.489 A˚ .

The software ORCRIT [26] was then applied to these four maps to locate their ED maxima. To remove CPs originating from ripples in the ED distributions because of series-termination errors, a cut-o value was selected to eliminate most of the unidentified low density peaks. This cut-o mainly a ects the number of peaks from bound water molecules, as shown in a study about the use of 2Fobs – Fcalc maps [50]. This lower limit value and the ED value of the highest peak found in each map are reported in Table 11.4.

In contrast with the hierarchical merging/clustering algorithm based on an analytical derivation of the ED peaks, there is no generation of fragments associated with the peaks. This therefore forbids their identification on the basis of their atomic content. To assign a chemical identification to each protein peak in a given grid, therefore, a list of reference sites was established. For each amino acid residue, n, two centers-of-mass (c.o.m.) locations were calculated, one for the sidechain and the other for the backbone atoms (CbO)n a(NaCa)nþ1. Other selected reference sites were the solvent and heteroatoms of the complex. The peak was then identified by determining its nearest protein, solvent, or heteroatom site.

Table 11.4 Number of peaks in experimental and promolecular XTAL ED maps of hAR at resolution values of 2.85 and 3.5 A˚ , and mean distances in A˚ (in parentheses) between the peaks and their nearest amino acid site (main chain or side-chain c.o.m.), or solvent atom site.

Resolution

 

˚

 

 

 

˚

 

2.85 A

 

 

3.5 A

 

No. of peaks

 

 

 

 

 

 

 

Experimental

Promolecular

Experimental

Promolecular

 

 

 

 

 

 

 

Main chain

322

317

 

271

237

 

(0.599 G0.434)

(0.548 G0.421)

(0.997 G0.459)

(0.995 G0.436)

Side-chain

340

313

 

288

227

 

(0.893 G0.554)

(0.870 G0.567)

(0.868 G0.525)

(0.770 G0.470)

Ligand

7

8

 

6

5

NADPþ

11

10

 

9

8

Citrate

4

4

 

6

4

Water

329

315

 

234

206

 

(0.478 G0.331)

(0.408 G0.289)

(1.193 G0.529)

(0.940 G0.380)

ED range (e A˚ 3)

0.6–4.95

0.8–5.46

0.6–3.86

0.5–3.07

11.4 Local Maxima in Experimental and Promolecular Medium-resolution ED Distributions 301

The results reported in Table 11.4 show that the total number of peaks depends, as expected, on the resolution. The number of side-chain peaks is close to the number of backbone peaks. This is especially true at 2.85 A˚ resolution, where each amino acid residue leads to a backbone and a side-chain peak, as already explained by Leherte et al. [1], Guo et al. [2], and Becue et al. [3]. The mean distances between the peaks and their nearest protein site is indeed shorter at 2.85 A˚ , except for the side-chains for which the peaks can be located farther from the c.o.m. in long chains. A statistical analysis of the backbone peaks was carried out as a function of the amino acid residue type. It showed that most of the amino acid backbones are represented by one peak only. More precisely, this concerns 90.5% (266/295) and 92.9% (275/296) of the peaks observed in the experimental and promolecular ED maps generated at 2.85 A˚ , respectively, and 82.9% (189/228) and 95.1% (215/226) in the corresponding ED maps generated at a resolution value of 3 A˚ . Backbone groups are more often represented by two peaks in the experimental map.

In the same way as for the backbone, side-chain groups lead to most single ED maxima, and most of the residue side-chains represented by two or more peaks can be regarded as medium or large groups. Short side-chain residues containing no heteroatoms (O or S), or only one, are represented by one peak only, i.e. ALA, CYS, GLY, SER, and THR, at d ¼ 2:85 A˚ . At d ¼ 2:85 A˚ , all TRP side-chains, which contain two fused rings, lead to at least two peaks, a trend that is partly verified for TYR side-chains (one aromatic ring and one hydroxyl group).

Finally, there are more discrepancies between experimental and promolecular XTAL at 3.5 A˚ resolution, e.g. for ARG, CYS, GLY, HIS, LEU, LYS, and PRO. In conclusion, such promolecular models are less predictive (in terms of topology, for instance) at low resolution; to confirm this, however, additional studies would be required.

Three-dimensional representations of the ED distributions of the adenine binding site are displayed in Fig. 11.5. A detailed analysis of the associated peaks is given in Table 11.5. The size and the atomic content of an amino acid residue affect the number of its peaks in a medium-resolution ED map. Table 11.5 shows that the density values at the peak locations are, in contrast, not clearly dependent on amino acid type. Let us also mention, however, that the residues CYS and MET, when present, are an exception because they contain sulfur atoms and lead to higher-density peaks [1, 3].

11.4.2

Promolecular Electron Density Distributions Calculated from Atoms (PASA Model)

The hierarchical merging/clustering algorithm described in Section 11.2.2.2 does not require any calculation of the ED maps. It is based solely on a knowledge of the analytical expression of the promolecular ED function and its first derivative. The decomposition of the protein structure into fragments was achieved at t values ranging from 0 to 0.70 A˚ 2, i.e. B ¼ 0 to 110.6 A˚ 2, with a step of 0.014 A˚ 2.

Соседние файлы в предмете Химия