Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

5.59 Mб

Скачать

☆

<<< < Предыдущая 50 51 52 53 54 55 56 57 58 59 60 6162 / 9562 63 64 65 66 67 68 69 70 71 72 73 74 > Следующая >>>

Comparative Protein Structure Modeling

295

that the model is correct [204,205]. Thus, distributions of many spatial features have been compiled from high resolution protein structures, and any large deviations from the most likely values have been interpreted as strong indicators of errors in the model. Such features include packing [206], formation of a hydrophobic core [207], residue and atomic solvent accessibilities [208–212], spatial distribution of charged groups [213], distribution of atom–atom distances [214], atomic volumes [215], and main chain hydrogen bonding [200].

Another group of methods for testing 3D models that implicitly take into account many of the criteria listed above involve 3D proﬁles and statistical potentials [87,216]. These methods evaluate the environment of each residue in a model with respect to the expected environment as found in the high resolution X-ray structures. Programs implementing this approach include VERIFY3D [216], PROSA [217], HARMONY [218], and ANOLEA [120].

An additional role of the model evaluation methods is to help in the actual modeling procedure. In principle, an improvement in the accuracy of a model is possible by incorporating the quality criteria into a scoring function being optimized to derive the model in the ﬁrst place.

VI. APPLICATIONS OF COMPARATIVE MODELING

Comparative modeling is often an efﬁcient way to obtain useful information about the proteins of interest. For example, comparative models can be helpful in designing mutants to test hypotheses about the protein’s function [89,219]; identifying active and binding sites [220]; searching for, designing, and improving ligands for a given binding site [221]; modeling substrate speciﬁcity [222]; predicting antigenic epitopes [223]; simulating pro- tein–protein docking [224]; inferring function from calculated electrostatic potential around the protein [225]; facilitating molecular replacement in X-ray structure determination [226]; reﬁning models based on NMR constraints [227]; testing and improving a sequence–structure alignment [228]; conﬁrming a remote structural relationship [59]; and rationalizing known experimental observations. For an exhaustive review of comparative modeling applications, see Ref. 3.

Fortunately, a 3D model does not have to be absolutely perfect to be helpful in biology, as demonstrated by the applications listed above. However, the type of question that can be addressed with a particular model does depend on the model’s accuracy. At the low end of the accuracy spectrum, there are models that are based on less than 25%

sequence identity and have sometimes less than 50% of their Cα atoms within 3.5 A of their correct positions. However, such models still have the correct fold, and even knowing only the fold of a protein is frequently sufﬁcient to predict its approximate biochemical function. More speciﬁcally, only nine out of 80 fold families known in 1994 contained proteins (domains) that were not in the same functional class, although 32% of all protein structures belonged to one of the nine superfolds [229]. Models in this low range of accuracy combined with model evaluation can be used for conﬁrming or rejecting a match between remotely related proteins [9,58].

In the middle of the accuracy spectrum are the models based on approximately 35%

sequence identity, corresponding to 85% of the Cα atoms modeled within 3.5 A of their correct positions. Fortunately, the active and binding sites are frequently more conserved

296	Fiser et al.

than the rest of the fold and are thus modeled more accurately [9]. In general, medium resolution models frequently allow a reﬁnement of the functional prediction based on sequence alone, because ligand binding is most directly determined by the structure of the binding site rather than its sequence. It is frequently possible to predict correctly important features of the target protein that do not occur in the template structure. For example, the location of a binding site can be predicted from clusters of charged residues [225], and the size of a ligand may be predicted from the volume of the binding site cleft [222]. Medium resolution models can also be used to construct site-directed mutants with altered or destroyed binding capacity, which in turn could test hypotheses about the sequence– structure–function relationships. Other problems that can be addressed with medium resolution comparative models include designing proteins that have compact structures without long tails, loops, and exposed hydrophobic residues for better crystallization and designing proteins with added disulﬁde bonds for extra stability.

The high end of the accuracy spectrum corresponds to models based on 50% sequence identity or more. The average accuracy of these models approaches that of low

resolution X-ray structures (3 A resolution) or medium resolution NMR structures (10 distance restraints per residue) [58]. The alignments on which these models are based generally contain almost no errors. In addition to the already listed applications, high quality models can be used for docking of small ligands [221] or whole proteins onto a given protein [224,230].

We now describe two applications of comparative modeling in more detail: (1) Modeling of substrate speciﬁcity aided by a high accuracy model and (2) conﬁrming a remote structural relationship based on a low accuracy model.

(a)

(b)

Figure 10 Models of complexes between BLBP and two different fatty acids. The fatty acid ligand is shown in the CPK representation. The small spheres in the ligand-binding cavity are water molecules. (a) Model of the BLBP–oleic acid complex, in which the cavity is not ﬁlled. (b) Model of the BLBP–docosahexaenoic acid complex, in which the cavity is ﬁlled. The ﬁgure was prepared using the program MOLSCRIPT [236].

Comparative Protein Structure Modeling

297

A. Ligand Speciﬁcity of Brain Lipid-Binding Protein

Brain lipid-binding protein (BLBP) is a member of the family of fatty acid binding proteins that was isolated from brain [222]. The problem was to ﬁnd out which one of the many fatty acids known to bind to fatty acid binding proteins in general is the likely physiological ligand of BLBP. To address this problem, comparative models of BLBP complexed with many fatty acids were calculated by relying on the structures of the adipocyte lipid-binding protein and muscle fatty acid binding protein, in complex with their ligands. The models were evaluated by binding and site-directed mutagenesis experiments [222]. The model of BLBP indicated that its binding cavity was just large enough to accommodate docosahexaenoic acid (DHA) (Fig. 10). Because DHA ﬁlled the BLBP binding cavity completely, it was unlikely that BLBP would bind a larger ligand. Thus, DHA was the ligand predicted to have the highest afﬁnity for BLBP. The prediction was conﬁrmed by the measurement of binding afﬁnities for many fatty acids. It turned out that the BLBP–DHA interaction was the strongest fatty acid–protein interaction known to date. The binding afﬁnities of

(a)

(b)

Figure 11 Conﬁrming structural similarity between the E. coli δ′ subunit of DNA polymerase III and RuvB. (a) A sequence alignment between the δ′ subunit and RuvB. (b) ProsaII proﬁles for the X-ray structure of the δ′ subunit (thin continuous line), Z 11.0; a model of RuvB based on its alignment to the δ′ subunit (thick line), Z 7.3; and a test model based on an incorrect alignment (dashed line), Z 0.9. The RuvB model based on the correct alignment has a signiﬁcant Z-score and only a few positive peaks in the proﬁle. This indicates that the model is plausible and that RuvB is indeed related structurally to the E. coli δ′ subunit. (From Ref. 217.)

298	Fiser et al.

the ligands correlated with the surface areas buried by the protein–ligand interactions, as calculated from the corresponding models, and explained why DHA had the highest afﬁnity.

This case illustrates how a comparative model provides new information that cannot be deduced directly from the template structures despite their high (60%) sequence identity to BLBP. The two templates have smaller binding sites and consequently different patterns of binding afﬁnities for the same set of ligands. The study also illustrated how new information is obtained relative to the target–template alignment even when the similarity between the target and the template sequences is high. The volumes and contact surfaces can be calculated only from a 3D model.

B. Finding Proteins Remotely Related to the E. coli ′ Subunit

The structure of the δ′ subunit of the clamp–loader complex of E. coli DNA polymerase III was determined by X-ray crystallography [59]. Several biological considerations and extremely weak sequence patterns indicated that δ′ may be structurally related to the RuvB family of DNA helicases. However, the relationship was not possible to prove on the basis of the alignment of the corresponding sequences alone; the sequence identities ranged from only 9% to 21%. To substantiate the putative match, comparative models for several RuvB helicases were constructed using the crystal structure of the δ′ subunit as the template. The models were evaluated by calculating their PROSAII Z-scores and energy proﬁles [217] (Fig. 11). This evaluation indicated strongly that the model is plausible and that RuvB is indeed related structurally to the E. coli δ′ subunit.

VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS

In a few years, the genome projects will have provided us with the amino acid sequences of more than a million proteins—the catalysts, inhibitors, messengers, receptors, transporters, and building blocks of the living organisms. The full potential of the genome projects will be realized only when we assign and understand the function of these new proteins. This will be facilitated by structural information for all or almost all proteins. This aim will be achieved by structural genomics, a focused, large-scale determination of protein structures by X-ray crystallography and nuclear magnetic resonance spectroscopy, combined efﬁciently with accurate, automated, and large-scale comparative protein structure modeling techniques [231]. Given current modeling techniques, it seems reasonable to require models based on at least 30% sequence identity, corresponding to one experimentally determined structure per sequence family rather than fold family. Since there are 1000–5000 fold families and perhaps about ﬁve times as many sequence families [16], the experimental effort in structural genomics has to deliver at least 10,000 protein domain structures.

To enable the large-scale comparative modeling needed for structural genomics, the steps of comparative modeling are being assembled into a completely automated pipeline. Because many computer programs for performing each of the operations in comparative modeling already exist, it may seem trivial to construct a pipeline that completely automates the whole process. In fact, it is not easy to do so in a robust manner. For a good

Comparative Protein Structure Modeling

299

Figure 12 ModBase, a database of comparative protein structure models. Screenshots of the following ModBase panels are shown: A form for searching for the models of a given protein, summary of the search results, summary of the models of a given protein, details about a single model, alignment on which a given model was based, 3D model displayed by RASMOL [237], and a model evaluation by the ProsaII proﬁle [217].

300 Fiser et al.

reason, most of the tasks in modeling of individual proteins, including template selection, alignment, and model evaluation, are typically performed with signiﬁcant human intervention. This allows the use of the best tool for a particular problem at hand and consideration of many different sources of information that are difﬁcult to take into account entirely automatically. Because large-scale modeling can be performed only in a completely automated manner, the main challenge is to build an automated and robust pipeline that approaches the performance of a human expert as much as possible.

Two applications of comparative modeling to complete genomes have been described. For the sequences encoded in the E. coli genome, models were built for 10–15% of the proteins using the SWISS-MODEL web server [232,233]. Peitsch et al. have recently also modeled many proteins in SWISS-PROT and made the models available on their SWISS-MODEL web site (see Table 1). Another large-scale modeling study was our own modeling of ﬁve prokaryotic and eukaryotic genomes [9]. The calculation resulted in the models for substantial segments of 17.2%, 18.1%, 19.2%, 20.4%, and 15.7% of all proteins in the genomes of Saccharomyces cerevisiae (6218 proteins in the genome);

Escherichia coli (4290 proteins), Mycoplasma genitalium (468 proteins), Caenorhabditis elegans (7299 proteins, imcomplete), and Methanococcus janaschii (1735 proteins), respectively. An important feature of this study was an evaluation of all the models. This evaluation is important because most of the related protein pairs share less than 30% sequence identity, resulting in signiﬁcant errors in the models. The models were assigned into the reliable or unreliable class by a procedure [9] that relies on the statistical potential function from PROSAII [217]. This allowed identiﬁcation of those models that were likely to be based on correct templates and at least approximately correct alignments. As a result, 236 yeast proteins without any prior structural information were assigned to a particular fold family; 40 of these proteins did not have any prior functional annotation. The models were also evaluated more precisely by using a calibrated relationship between the model accuracy and the percentage sequence identity on which the model is based [9]. Almost half of the 1071 reliably modeled proteins in the yeast genome share more than approximately 35% sequence identity with their templates. All the alignments, models, and model evaluations are available in the ModBase database of comparative protein structure models (Fig. 12) [234]. Most recently, the combined use of PSI-BLAST [36] with the model building and a new model evaluation [9] allowed us to calculate reliable models for 50%

of the proteins in the TrEMBL database (R. Sanchez,´	ˇ
of the proteins in the TrEMBL database (R. Sanchez,´	F. Mels, A. Sali, in preparation)
[234].

Large-scale comparative modeling opens new opportunities for tackling existing problems by virtue of providing many protein models from many genomes. One example is the selection of a target protein for which a drug needs to be developed. A good choice is a protein that is likely to have high ligand speciﬁcity; speciﬁcity is important because speciﬁc drugs are less likely to be toxic. Large-scale modeling facilitates imposing the speciﬁcity ﬁlter in target selection by enabling a structural comparison of the ligand binding sites of many proteins, either human or from other organisms. Such comparisons may make it possible to select rationally a target whose binding site is structurally most different from the binding sites of all the other proteins that may potentially react with the same drug. For example, when a human pathogenic organism needs to be inhibited, a good target may be a protein whose binding site shape is different from related binding sites in all of the human proteins. Alternatively, when a human metabolic pathway needs to be regulated, the target identiﬁcation could focus on that particular protein in the pathway that has the binding site most dissimilar from its human homologs.

<<< < Предыдущая 50 51 52 53 54 55 56 57 58 59 60 6162 / 9562 63 64 65 66 67 68 69 70 71 72 73 74 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.201317.3 Mб73Astruc D. - Modern arene chemistry (2002)(en).pdf
#
15.08.20131.25 Mб58auto_book.doc
#
15.08.20135.06 Mб120Baer M., Billing G.D. (eds.) - Advances in Chemical Physics. The Role of Degenerate States in Chemistry, Vol. 124 (2002)(en).pdf
#
15.08.20134.29 Mб16Baer M., Billing G.D. (eds.) - The role of degenerate states in chemistry (Adv.Chem.Phys. special issue, Wiley, 2002).pdf
#
15.08.20137.08 Mб55Basov N.I. i dr. Raschet i konstruirovanie formiruyushchego instrumenta dlya izgotovleniya izdelij (1991.pdf
#
15.08.20135.59 Mб68Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
#
15.08.2013324.82 Кб32benzyne-cyclization.pdf
#
15.08.201314.48 Mб18Borowko M. 2000 Computational methods in surface and colloid science.djvu
#
15.08.20134.3 Mб48Brereton Chemometrics.pdf
#
15.08.20131.07 Mб30Burshtejn K.Ya., Shorygin P.P. Kvantovohimicheskie raschety v organicheskoj himii i molekulyarnoj.pdf
#
15.08.201321.36 Mб45Carey F.A. - Organic Chemistry (2004)(en).djvu