Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Genomics and Proteomics Engineering in Medicine and Biology - Metin Akay

.pdf
Скачиваний:
50
Добавлен:
10.08.2013
Размер:
9.65 Mб
Скачать

254 COMPUTATIONAL ANALYSIS OF PROTEINS

26.A. Bairoch and P. Bucher, “Prosite: Recent developments,” Nucleic Acids Res., 22: 3583–3589, 1994.

27.K. Hofmann, P. Bucher, L. Falquet, and A. Bairoch, “The PROSITE database, its status in 1999,” Nucleic Acids Res., 27(1): 215–219, 1999.

28.H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The Protein Data Bank,” Nucleic Acids Res., 28(1): 235–242, 2000.

29.F. M. Pearl, N. Martin, J. E. Bray, D. W. Buchan, A. P. Harrison, D. Lee, G. A. Reeves,

A.J. Shepherd, I. Sillitoe, A. E. Todd, J. M. Thornton, and C. A. Orengo, “A rapid classification protocol for the CATH Domain Database to support structural genomics,” Nucleic Acids Res., 29(1): 223–227, 2001.

30.A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “SCOP: A structural classification of proteins database for the investigation of sequences and structures,” J. Mol. Biol., 247: 536–540, 1995.

31.A. S. Siddiqui, U. Dengler, and G. J. Barton, “3Dee: A database of protein structural domains,” Bioinformatics, 17: 200–201, 2001.

32.T. K. Attwood and M. E. Beck, “Prints—a protein motif fingerprint database,” Protein Eng., 7: 841–848, 1994.

33.R. Hughey and A. Krogh, “Hidden Markov models for sequence analysis: Extension and analysis of the basic method,” Comput. Appl. Biosci., 12(2): 95–107, 1996.

34.C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton, “Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment,” Science, 262(5131): 208–214, 1993.

35.T. L. Bailey and C. Elkan, “Fitting a mixture model by expectation maximization to discover motifs in biopolymers,” in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 1994, AAAI Press, Menlo Park, CA, pp. 28–36.

36.G. Bejerano and G. Yona, “Variations on probabilistic suffix trees: statistical modeling and prediction of protein families,” Bioinformatics, 17: 23–43, 2001.

37.K. Blekas, D. I. Fotiadis, and A. Likas, “Greedy mixture learning for multiple motif discovery in biological sequences,” Bioinformatics, 19(5): 607–617, 2003.

38.W. N. Grundy, T. L. Bailey, C. P. Elkan, and M. E. Baker, “Meta-MEME: Motif-based Hidden Markov Models of Protein Families,” Comput. Appl. Biosci., 13: 397–406, 1997.

39.S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48: 443– 453, 1970.

40.T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,”

J.Mol. Biol., 147(1): 195–197, 1981.

41.S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., 215(3): 403–410, 1990.

42.W. R. Pearson and D. J. Lipman, “Improved tools for biological sequence comparison,” Proc. Natl. Acad. Sci., 85(8): 2444–2448, 1988.

43.J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Res., 22: 4673–4680, 1994.

REFERENCES 255

44.W. R. Pearson, “Effective protein sequence comparison,” Methods Enzymol., 266: 227–258, 1996.

45.S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,”

Proc. Natl. Acad. Sci. USA, 89: 10915–10919, 1992.

46.M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt, “A model of evolutionary change in proteins. Matrices for detecting distant relationships,” in M. O. Dayhoff (Ed.), Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC, 1978, pp. S345–S358.

47.R. Arratia and M. S. Waterman, “A phase transition for the score in matching random sequences allowing deletions,” Ann. Appl. Probab., 4: 200–225, 1994.

48.W. R. Pearson and T. C. Wood, “Statistical significance in biological sequence comparison,” in D. J. Balding, M. Bishop, and C. Cannings (Eds.), Handbook of Statistical Genetics, Wiley, Chichester, 2001, pp. 39–65.

49.H.-D. Ho¨ltje, W. Sippl, D. Rognan, and G. Folkers, Molecular Modeling: Basic Principles and Applications, Wiley-VCH, Weinheim, 2003.

50.A. Hinchliffe, Molecular Modelling for Beginners, Wiley, New York, 2003.

51.S. Payne, “Classification of protein sequences into homogenous families,” MSc Thesis, University of Frankfurt, Germany, 2001.

52.J. U. Bowie, R. Luthy, and D. Eisenberg, “A method to identify protein sequence that fold into a known three-dimensional structure,” Science, 253: 164–170, 1991.

53.D. T. Jones, W. R. Taylor, and J. M. Thornton, “A new approach to protein fold recognition,” Nature, 358: 86–89, 1992.

54.H. Flockner, F. Domingues, and M. J. Sippl, “Proteins folds from pair interactions: A blind test in fold recognition,” Proteins: Struct. Funct. Genet., 1: 129–133, 1997.

55.C. Lampros, C. Papaloukas, Y. Goletsis, and D. I. Fotiadis, “Sequence based protein structure prediction using a reduced state-space Hidden Markov Model,” submitted for publication.

56.B. Rost and C. Sander, “Prediction of protein secondary structure at better than 70% accuracy,” J. Mol. Biol., 232: 584–599, 1993.

57.B. Rost, “Review: Protein secondary structure prediction continues to rise,” J. Struct. Biol., 134: 204–218, 2001.

58.G. B. Fogel and D. W. Corne, Evolutionary Computation in Bioinformatics, Morgan Kaufmann, San Francisco, 2002.

59.R. Konig and T. Dandekar, “Improving genetic algorithms for protein folding simulations by systematic crossover,” Biosystems, 50(1): 17–25, 1999.

60.J. Bino and A. Sali, “Comparative protein structure modelling by iterative alignment, model building and model assessment,” Nucleic Acids Res., 31(14): 3982–3992, 2003.

61.C. Notredame, “Using genetic algorithms for pairwise and multiple sequence alignments,” in G. B. Fogel and D. W. Corne (Eds.), Evolutionary Computation in Bioinformatics, Morgan Kaufmann, San Francisco, 2003.

62.A. J. Shepherd, D. Gorse, and J. M. Thornton, “A novel approach to the recognition of protein architecture from sequence using Fourier analysis and neural networks,”

Proteins: Struct., Func., Genet., 50: 290–302, 2003.

63.C. Ding and I. Dubchak, “Multi-class protein fold recognition using support vector machines and neural networks,” Bioinformatics, 17: 349–358, 2001.

256 COMPUTATIONAL ANALYSIS OF PROTEINS

64.J. T. Chang, H. Schutze, and R. B. Altman, “GAPSCORE: Finding gene and protein names one word at a time,” Bioinformatics, 22(2): 216–225, 2003.

65.M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman, “Using BLAST for identifying gene and protein names in journal articles,” Gene, 259: 245–252, 2000.

66.K. Fukuda, A. Tamura, T. Tsunoda, and T. Takagi, “Toward information extraction: Identifying protein names from biological papers,” Pacific Symp. Biocomput., 3: 707–718, 1998.

67.T. Rindflesch, L. Tanabe, J. Weinstein, and L. Hunter, “EDGAR: Extraction of drugs, genes and relations from the biomedical literature,” Pacific Symp. Biocomput., 5: 517–528, 2000.

68.W. Hole and S. Srinivasan, “Discovering missed synonyms in a large concept-oriented metathesaurus,” Proc. AMIA Symp., 354–358, 2000.

69.M. Yoshida, K. Fukuda, and T. Takagi, “PNAD-CSS: A workbench for constructing a protein name abbreviation dictionary,” Bioinformatics, 16: 169–175, 2000.

70.H. Yu, C. Friedman, and G. Hripcsak, “Mapping abbreviations to full forms in biomedical articles,” J. Am. Med. Inform. Assoc., 9: 262–272, 2002.

71.H. Yu and E. Agichtein, “Extracting synonymous gene and protein terms from biological literature,” Bioinformatics, 19(1): S340–S349, 2003.

72.J. T. Chang, S. Raychaudhuri, and R. B. Altman, “Including biological literature improves homology search,” Pacific Symp. Biocomput., 6: 374–383, 2001.

73.T. Sekimizu, H. Park, and J. Tsujii, “Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts,” Genome Inform. Ser.: Proc. Workshop Genome Inform., 9: 62–71, 1998.

74.T. Ono, H. Hishigaki, A. Tanigami, and T. Tagaki, “Automated extraction of information on protein-protein interactions from the biological literature,” Bioinformatics, 17(2): 155–161, 2001.

75.J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and M. Carroll, “Automatic extraction of protein interactions from scientific abstracts,” Proc. Pacific Symp. Biocomput., 5: 538–549, 2000.

76.C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, “GENIES: A naturallanguage processing system for the extraction of molecular pathways from journal articles,” Bioinformatics, 17: S74–S82, 2001.

77.J.-H. Chiang and H.-C. Yu, “MeKE: Discovering the functions of gene products from biomedical literature via sentence alignment,” Bioinformatics, 19(11): 1417–1422, 2003.

78.The Gene Ontology Consortium (http://www.genontology.org), “Gene Ontology: Tool for the unification of biology,” Nature Genet., 25: 25–29, 2000.

79.A. Koike, Y. Niwa, and T. Tagaki, “Automatic extraction of gene/protein biological functions from biomedical text,” Bioinformatics, 21(7): 1227–1236, 2005.

&CHAPTER 10

Computational Analysis of

Interactions Between Tumor

and Tumor Suppressor Proteins

E. PIROGOVA, M. AKAY, and I. COSIC

10.1. INTRODUCTION

Cancer cell development is attributed to different mutations and alterations in deoxyribonucleic acid (DNA). DNA is a large molecule structured from chains of repeating units of the sugar deoxyribose and phosphate linked to four different bases, abbreviated A, T, G, and C. DNA carries the genetic information of a cell and consists of thousands of genes. DNA controls all cell biological activities. Each gene serves as a recipe on how to build a protein molecule. Proteins perform important tasks for the cell functions or serve as building blocks. The flow of information from the genes determines the protein composition and thereby the functions of the cell. The DNA is situated in the nucleus, organized into chromosomes (Fig. 10.1). Every cell must contain the genetic information and the DNA is therefore duplicated before a cell divides (replication). When proteins are needed, the corresponding genes are transcribed into RNA (transcription). The RNA is first processed so that noncoding parts are removed ( processing) and is then transported out of the nucleus (transport). Outside the nucleus, the proteins are built based upon the code in the RNA (translation).

A significant role in current cancer research is attributed to bioengineering, which is focused on understanding and interpretation of this disease, in terms of gene identification, protein, and DNA modelling, aiming to better understand and evaluate the biochemical processes in cells/tissues, the grounds of disease, and development of progressive diagnostics and drugs using engineering instrumentation and computer science methodologies.

Normally, cells grow and divide to form new cells, as the body needs them. When cells grow old, they die, and new cells take their place. The cell cycle is an ordered

Genomics and Proteomics Engineering in Medicine and Biology. Edited by Metin Akay Copyright # 2007 the Institute of Electrical and Electronics Engineers, Inc.

257

258 TUMOUR AND TUMOUR SUPPRESSOR PROTEINS

FIGURE 10.1. Cell nucleus.

set of events, culminating in cell growth and division into two daughter cells. Nondividing cells are not considered to be in the cell cycle. The stages, pictured in Figure 10.2, are G1–S–G2–M. The G1 stage stands for gap 1. The S stage stands for synthesis. This is the stage when DNA replication occurs. The G2 stage stands for gap 2. The M stage stands for mitosis and refers to when nuclear (chromosomes separate) and cytoplasmic (cytokinesis) division occur.

How cell division and thus tissue growth are controlled is a very complex issue in molecular biology. Sometimes this orderly process goes wrong. New cells form

10.1. INTRODUCTION

259

FIGURE 10.2. Stages of the cell cycle.

when the body does not need them, and old cells do not die when they should. These extra cells can form a mass of tissue called a growth or tumor. In cancer abnormal cells divide without control, the regulation of the cell cycle goes awry, and normal cell growth and behavior are lost. As a consequence cancer cells can invade nearby tissues and then spread through the bloodstream and lymphatic system to other parts of the body. Apparently, the cell will become cancerous when the right combination of genes is altered.

However, there are some genes that help to prevent cell malignant behavior and therefore are referred to as tumor suppressor genes. Tumor suppressor genes have been detected in the human genome and are very difficult to isolate and analyse. The Rb tumor suppressor gene is located on chromosome 13 of humans. This gene suppresses the development of cancer as its dominant phenotype. Therefore both alleles must be mutant for the disease to develop. The Rb gene product interacts with a protein called E2F, the nuclear transcription factor involved in cellular replication functions during the S phase of the cell cycle. When the Rb gene product is mutated, a cell division at the S phase does not occur and normal cells become cancerous. Located on human chromosome 17, p53 is another gene with tumor suppressor activity. This protein contains 393 amino acids and a single amino acid substitution can lead to loss of function of the gene. Mutations at amino acids 175, 248, and 273 can lead to loss of function, and changes at 273 (13%) are the most common [1, 2]. These all act as recessive mutations. Dominant gain-of- function mutations have also been found that lead to uncontrolled cell division. Because these mutations can be expressed in heterozygous conditions, they are often associated with cancers. The genetic function of the p53 gene is to prevent a division of cells with damaged DNA. Damaged DNA could contain genetic changes that promote uncontrolled cell growth. If the damage is severe, this protein can cause apoptosis, forcing “bad” cells to commit suicide. The p53 levels are increased in damaged cells. In some way, p53 seems to evaluate the extent of

260 TUMOUR AND TUMOUR SUPPRESSOR PROTEINS

damage to DNA, at least for damage by radiation. At low levels of radiation, producing damage that can be repaired, p53 triggers arrest of the cell cycle until the damage is repaired. At high levels of radiation, producing hopelessly damaged DNA, p53 triggers apoptosis. If the damage is minor, p53 halts the cell cycle— and hence cell division—until the damage is repaired. About 50% of human cancers can be associated with a p53 mutation, including cancers of the bladder, breast, cervix, colon, lung, liver, prostate, and skin. These types of cancers are also more aggressive and have a higher degree of fatalities [1–4].

Other genes, known as proto-oncogenes, can promote cancer if they acquire new properties as a result of mutations, at which point they are called oncogenes. Most common cancers involve both inactivation of specific tumor suppressor genes and activation of certain proto-oncogenes. Proto-oncogene proteins are the products of proto-oncogenes. Normally they do not have oncogenic or transforming properties but are involved in the regulation or differentiation of cell growth. They often have protein kinase (protein phosphotransferase) activity.

There is another interesting group of proteins that play a significant “defending role” in the cell life cycle—heat shock proteins (HSPs). They are a group of proteins that are present in all living cells. The HSPs are induced when a cell is influenced by environmental stresses like heat, cold, and oxygen deprivation. Under perfectly normal conditions HSPs act like “chaperones,” helping new or distorted proteins fold into shapes essential for their function, shuttling proteins, and transporting old proteins to “garbage disposals” inside the cell. Also HSPs help the immune system recognize diseased cells [5–7]. Twenty years ago HSPs were identified as the key elements responsible for protecting animals from cancer, and studies toward antitumor vaccine development still continue today. Today HSP-based immunotherapy is believed to be one of the most promising areas of developing cancer treatment technology that is characterized by a unique approach to every tumor [5–7].

Recent findings in cancer research have established a connection between a T-antigen—common virus—and a brain tumor in children. The studies suggested the T-antigen, the viral component of a specific virus, called the JC virus, plays a significant role in the development of the most frequent type of malignant brain tumours by blocking the functionality of tumor suppressor proteins such as p53 and pRb. The JC virus (JCV) is a neurotropic polyoma virus infecting greater than 70% of the human population worldwide during early childhood [1–4]. The JC virus possesses an oncogenic potential and induces development of various neuroectodermal origin tumors, including medulloblastomas and glioblastomas. Medulloblastomas are the second most common type of brain tumor in children, making up about 20% of cases. They grow fast and can spread widely throughout the body, and nearly half of all children affected with them die. Radiation exposure and certain genetic diseases are known to put children at risk, but in most cases the cause of medulloblastomas is still a mystery [2].

The most important role in this process is attributed to T-antigen, which has the ability to associate with and functionally inactivate well-studied tumor suppressor proteins p53 and pRb. The immunohistochemical analyses revealed expression of

10.2. METHODOLOGY: RESONANT RECOGNITION MODEL

261

JCV T-antigen in the nuclei of tumor cells [1–4]. The findings of “in vitro” cancer research indicated that the sirnian virus gene SV40 induces neoplastic transformation by disabling several key cellular growth regulatory circuits. Among these are the Rb and p53 families of tumor suppressors. The multifunctional large T-antigen blocks the function of both Rb and p53. Large T-antigen uses multiple mechanisms to block p53 activity, and this action contributes to tumor genesis, in part, by blocking p53-mediated growth suppression and apoptosis. Since the p53 pathway is inactivated in most human tumors, T-antigen/p53 interactions offer a possible mechanism by which SV40 gene contributes to human cancer. Thus, analysis of the gene encoding p53 and pRb proteins could serve to evaluate the effectiveness of a cancer treatment. Mutations in this gene occur in half of all human cancers, and regulation of the protein is defective in a variety of others. Novel strategies that exploit knowledge of the function and regulation of p53 are being actively investigated [1–4]. Strategies directed at treating tumors that have p53 mutations include gene therapy, viruses that only replicate in p53-deficient cells, and the search for small molecules that reactivate mutant p53. Potentiating the function of p53 in a nongenotoxic way in tumors that express wild-type protein can be achieved by inhibiting the expression and function of viral oncoproteins [1–4].

Therefore an analysis of mutual relationships between viral proteins and two groups of “natural defenders”—tumor suppressors, p53 and pRb, and HSPs—is of great importance in the development of new methodology or drug design for cancer treatment.

10.2. METHODOLOGY: RESONANT RECOGNITION MODEL

Currently a huge amount of scientific effort is directed at solving the problem of finding a cure for cancer. New and advanced drugs and methodologies have been developed and applied with some grade of success; however, the battle with cancer is still continuing. There is an urgent need for theoretical approaches that are capable of analyzing protein and DNA structure–function relationships leading to the design of new drugs efficient to fight many diseases, including cancer.

The resonant recognition model (RRM) [8, 9] essentially presents a nontraditional computational approach designed for protein and DNA structure–function analysis and based on a nontraditional “engineering view” of biomolecules. This methodology significantly differs from other protein analysis approaches, commonly used classical letter-to-letter or block-to-block methods (homology search and sequence alignment), in terms of the view of the physical nature of molecule interactions within the living cells. The RRM concepts present a new perspective on life processes at the molecular level, bringing a number of practical advantages to the fields of molecular biology, biotechnology, medicine, and agriculture.

The physical nature of the biological function of a protein or DNA is based on the ability of the macromolecule to interact selectively with the particular targets (other proteins, DNA regulatory segments, or small molecules). The RRM is a physicomathematical model that interprets protein sequence linear information using

262 TUMOUR AND TUMOUR SUPPRESSOR PROTEINS

digital signal-processing methods (Fourier and wavelet transform) [8–13]. Initially the original protein primary structure, that is, the amino acid sequence, is transformed into the numerical sequence by assigning to each amino acid in the protein molecule a physical parameter value relevant to the protein’s biological activity. Accordingly to RRM main postulates, there is a significant correlation between spectra of the numerical presentation of the protein sequences and their biological activity [8, 9]. A number of amino acid indices (437 have been published up to now) have been found to correlate in some way with the biological activity of the whole protein. Previous investigations [14–19] have shown that the best correlation can be achieved with parameters related to the energy of delocalized free electrons of each amino acid. These findings can be explained by the fact that these electrons have the strongest impact on the electronic distribution of the whole protein. By assigning the electron–ion interaction potential (EIIP) [20] value to each amino acid, the protein sequence can be converted into a numerical sequence. These numerical series can then be analyzed by appropriate digital signal-processing methods (fast Fourier transform is generally used). The EIIP values for 20 amino acids as well as for 5 nucleotides (the whole procedure can be applied to DNA and RNA too) are shown in Table 10.1.

To determine the common frequency components in the spectra for a group of proteins, multiple cross-spectral functions are used. Peaks in such functions denote common frequency components for the sequences analyzed. Through an extensive study, the RRM has reached a fundamental conclusion: One characteristic frequency characterizes one particular biological function or interaction [8, 9]. This frequency is related to the biological function provided the following criteria are met:

.One peak only exists for a group of protein sequences sharing the same biological function.

. No significant peak exists for biologically unrelated protein sequences.

. Peak frequencies are different for different biological functions.

It has been found through extensive research that proteins with the same biological function have a common frequency in their numerical spectra. This frequency was found to be a characteristic feature for protein biological function or interaction. The results of our previous work are summarized in Table 10.2, where each functional group of proteins or DNA regulatory sequences is shown with its characteristic frequency and corresponding signal-to-noise ratio (S/N) within the multiple cross-spectral function [9].

It is assumed that the RRM characteristic frequency represents a crucial parameter of the recognition between the interacting biomolecules. In our previous work [8, 9, 14–19] we have shown in a number of examples of different protein families that proteins and their interacting targets (receptors, binding proteins, inhibitors) display the same characteristic frequency in their interactions. However, it is obvious that one protein can participate in more than one biological process, that is, revealing more than one biological function. Although a protein and its target

10.2. METHODOLOGY: RESONANT RECOGNITION MODEL

263

TABLE 10.1 Electron–Ion Interaction Potential (EIIP) Values for Nucleotides and Amino Acids

Nucleotide

EIIP (Ry)

A

0.1260

G

0.0806

T

0.1335

C

0.1340

U

0.0289

Amino Acid

EIIP (Ry)

Leu

0.0000

Ile

0.0000

Asn

0.0036

Gly

0.0050

Val

0.0057

Glu

0.0058

Pro

0.0198

His

0.0242

Lys

0.0371

Ala

0.0373

Tyr

0.0516

Trp

0.0548

Gln

0.0761

Met

0.0823

Ser

0.0829

Cys

0.0829

Thr

0.0941

Phe

0.0946

Arg

0.0959

Asp

0.1263

 

 

have different biological functions, they can participate in the same biological process, which is characterized by the same frequency. Therefore, we postulate that the RRM frequency characterizes a particular biological process of interaction between selected biomolecules. Moreover, further research in this direction has led to the conclusion that interacting molecules have the same characteristic frequency but opposite phases at that frequency. Once the characteristic frequency for the particular biological function/interaction is determined, it becomes possible to identify the individual “hot-spot” amino acids that contributed most to this specific characteristic frequency and thus to the observed protein’s biological behavior. Furthermore, it is possible then to design bioactive peptides having only the determined characteristic frequency and consequently the desired biological function [18, 19, 21, 22, 23].

Here we present an application of the RRM approach to analysis of the mutual relationships between brain-tumor-associated viral proteins T-antigen and agnoprotein, tumor suppressor proteins p53 and pRb, and HSPs. Also we present the results of the study of possible interactions between melatonin, interleukin-2