Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

.pdf
Скачиваний:
61
Добавлен:
10.08.2013
Размер:
10.5 Mб
Скачать

Preface

xi

structure–activity relationship. A number of software packages for structure-based design are compared.

Chapter 17 (Protein Structure Prediction as a Systems Problem) provides a novel systematic view on solving the complex problem of protein structure prediction. It introduces consensus-based approach, pipeline approach, and expert system for predicting protein structure and for inferring protein functions. This chapter also discusses issues such as benchmark data and evaluation metrics. An example of protein structure prediction at genome-wide scale is also given.

Chapter 18 (Resources and Infrastructure for Structural Bioinformatics) describes tools, databases, and other resources of protein structure analysis and prediction available on the Internet. These include the PDB and related databases and servers, structural visualization tools, protein sequence and function databases, as well as resources for RNA structure modeling and prediction. It also gives information on major journals, professional societies, and conferences of the field.

Appendix 1 (Biological and Chemical Basics Related to Protein Structures) introduces central dogma of molecular biology, macromolecules in the cell (DNA, RNA, protein), amino acid residues, peptide chain, primary, secondary, tertiary, and quaternary structure of proteins, and protein evolution.

Appendix 2 (Computer Science for Structural Informatics) discusses computer science concepts that are essential for effective computation for protein structure prediction. These include efficient data structure, computational complexity and NP-hardness, various algorithmic techniques, parallel computing, and programming.

Appendix 3 (Physical and Chemical Basis for Structural Bioinformatics) covers basic concepts of our physical world, including unit system, coordinate systems, and energy surfaces. It also describes biochemical and biophysical concepts such as chemical reaction, peptide bonds, covalent bonds, hydrogen bonds, electrostatic interactions, van der Waals interactions, as well as hydrophobic interactions. In addition, this chapter discusses basic concepts from thermodynamics and statistical mechanics. Computational sampling techniques such as molecular dynamics and Monte Carlo method are also discussed.

Appendix 4 (Mathematics and Statistics for Studying Protein Structures) covers various basic concepts in mathematics and statistics, often used in structural bioinformatics studies such as probability distributions (uniform, Gaussian, binomial and multinomial, Dirichlet and gamma, extreme value distribution), basics of information theory including entropy, relative entropy, and mutual information, Markovian process and hidden Markov model, hypothesis testing, statistical inference (maximum likelihood, expectation maximization, and Bayesian approach), and statistical sampling (rejection sampling, Gibbs sampling, and Metropolis–Hastings algorithm).

Ying Xu

Dong Xu

Jie Liang

John Wooley

April 2006

Acknowledgments

During the editing of this book, we, the editors, have received tremendous help from many friends, colleagues, and families, to whom we would like to take this opportunity to express our deep gratitude and appreciation. First we would like to thank Dr. Eli Greenbaum of Oak Ridge National Laboratory, who encouraged us to start this book project and contacted the publisher at Springer on our behalf. We are very grateful to the following colleagues who have critically reviewed the drafts of the chapters of the book at various stages: Nick Alexandrov, Nir Ben-Tal, Natasja Brooijmans, Chris Bystroff, Pablo Chacon, Luonan Chen, Zhong Chen, Yong Duan, Roland Dunbrack, Daniel Fischer, Juntao Guo, Jaap Heringa, Xiche Hu, Ana Kitazono, Ioan Kosztin, Sandeep Kumar, Xiang Li, Guohui Lin, Zhijie Liu, Hui Lu, Alex Mackerell, Kunbin Qu, Robert C. Rizzo, Ilya Shindyalov, Ambuj Singh, Alex Tropsha, Iosif Vaisman, Ilya Vakser, Stella Veretnik, Björn Wallner, Jin Wang, Zhexin Xiang, Yang Dai, Xin Yuan, and Yaoqi Zhou. Their invaluable input on the scientific content, on the pedagogical style, and on the writing style helped to improve these book chapters significantly. We also want to thank Ms. Joan Yantko of the University of Georgia for her tireless help on numerous fronts in this book project, including taking care of a large number of email communications between the editors and the authors and chasing busy authors to get their revisions and other materials. Last but not least, we want to thank our families for their constant support and encouragement during the process of us working on this book project.

xiii

Contents

Contributors ..............................................................................

xvii

1

A Historical Perspective and Overview of Protein

 

 

Structure Prediction ..............................................................

1

 

John C. Wooley and Yuzhen Ye

 

2

Empirical Force Fields ...........................................................

45

 

Alexander D. MacKerell, Jr.

 

3

Knowledge-Based Energy Functions for Computational

 

 

Studies of Proteins.................................................................

71

 

Xiang Li and Jie Liang

 

4

Computational Methods for Domain Partitioning of

 

 

Protein Structures .................................................................

125

 

Stella Veretnik and Ilya Shindyalov

 

5

Protein Structure Comparison and Classification.........................

147

 

Orhan C¸ amoglu˘ and Ambuj K. Singh

 

6

Computation of Protein Geometry and Its Applications:

 

 

Packing and Function Prediction..............................................

181

 

Jie Liang

 

7

Local Structure Prediction of Proteins.......................................

207

 

Victor A. Simossis and Jaap Heringa

 

8

Protein Contact Map Prediction...............................................

255

 

Xin Yuan and Christopher Bystroff

 

9

Modeling Protein Aggregate Assembly and Structure ...................

279

 

Jun-tao Guo, Carol K. Hall, Ying Xu, and Ronald B. Wetzel

 

10

Homology-Based Modeling of Protein Structure ..........................

319

 

Zhexin Xiang

 

xv

xvi

Contents

11 Modeling Protein Structures Based on Density Maps

 

at Intermediate Resolutions.....................................................

359

Jianpeng Ma

 

Index ........................................................................................

389

Contributors

Natasja Brooijmans

Chemical and Screening Sciences

Wyeth Research

Pearl River, New York 10965

Christopher Bystroff

Department of Biology

Rensselaer Polytechnic Institute

Troy, New York 12180

Liming Cai

Department of Computer Science

University of Georgia

Athens, Georgia 30602-7404

Orhan Camoglu

Department of Computer Science

University of California Santa Barbara

Santa Barbara, California 93106

Yang Dai

Department of Bioengineering

University of Illinois at Chicago

Chicago, Illinois 60607-7052

Haobo Guo

Department of Biochemistry and

Cellular and Molecular Biology

University of Tennessee

Knoxville, Tennessee 37996

Hong Guo

Department of Biochemistry and

Cellular and Molecular

Biology

University of Tennessee

Knoxville, Tennessee 37996

Jun-tao Guo

Department of Biochemistry and

Molecular Biology

University of Georgia

Athens, Georgia 30602-7229

Carol K. Hall

Department of Chemical and

Biomolecular Engineering

North Carolina State University

Raleigh, North Carolina 27695

Jaap Heringa

Centre for Integrative Bioinformatics Vrije Universiteit

1081 HV Amsterdam, The

Netherlands

xvii

xviii

Contributors

Xiche Hu

Department of Chemistry

University of Toledo

Toledo, Ohio 43606

Ling-Hong Hung

Department of Microbiology

University of Washington

Seattle, Washington 98195-7242

Xiang Li

Department of Bioengineering

University of Illinois at Chicago

Chicago, Illinois 60607-7052

Jie Liang

Department of Bioengineering

University of Illinois at Chicago

Chicago, Illinois 60607-7052

Guohui Lin

Department of Computing Science

University of Alberta

Edmonton, Alberta T6G 2E8, Canada

Zhijie Liu

Department of Biochemistry and

Molecular Biology

University of Georgia

Athens, Georgia 30602-7229

Hui Lu

Department of Bioengineering

University of Illinois at Chicago

Chicago, Illinois 60607-7052

Jianpeng Ma

Department of Biochemistry and

Molecular Biology

Baylor College of Medicine

Houston, Texas 77030

and

Department of Bioengineering

Rice University

Houston, Texas 77005

Alexander D. MacKerell, Jr.

Department of Pharmaceutical

Chemistry

School of Pharmacy

University of Maryland

Baltimore, Maryland 21201

Shing-Chung Ngan

Department of Microbiology

University of Washington

Seattle, Washington 98195-7242

Ognjen Periˇsi´c

Department of Bioengineering

University of Illinois at Chicago

Chicago, Illinois 60607-7052

Contributors

xix

Brian Pierce

Stella Veretnik

Department of Biomedical

San Diego Supercomputer Center

Engineering

University of California San Diego

Boston University

San Diego, California 92093-0505

Boston, Massachusetts 02215

 

 

Zhiping Weng

Kunbin Qu

Department of Biomedical

 

Department of Chemistry

Engineering

Rigel Pharmaceuticals, Inc.

Boston University

San Francisco, California 94080

Boston, Massachusetts 02215

Ram Samudrala

Department of Microbiology

University of Washington

Seattle, Washington 98195-7242

Ilya Shindyalov

San Diego Supercomputer Center

University of California San Diego

San Diego, California 92093-0505

Victor A. Simossis

Centre for Integrative Bioinformatics Vrije Universiteit

1081 HV Amsterdam, The Netherlands

Ambuj K. Singh

Department of Computer Science

University of California Santa Barbara

Santa Barbara, California 93106

Ronald B. Wetzel

Department of Structural Biology Pittsburgh Institute for

Neurodegenerative Diseases

University of Pittsburgh School of

Medicine

Pittsburgh, Pennsylvania 15260

John C. Wooley

Associate Vice Chancellor for Research

University of California San Diego

San Diego, California 92093-0043

Zhexin Xiang

Center for Molecular Modeling Center for Information Technology National Institutes of Health Bethesda, Maryland 20892-5624

xx

Contributors

Dong Xu

Yuzhen Ye

Computer Science Department

Bioinformatics and Systems Biology

University of Missouri—Columbia

Department

Columbia, Missouri 65211-2060

The Burnham Institute for Medical

 

Research

Ying Xu

La Jolla, California 92037

Institute of Bioinformatics and

Xin Yuan

Department of Biochemistry

 

and Molecular Biology

Department of Computer Science

University of Georgia

Florida State University

Athens, Georgia 30602-7229

Tallahassee, Florida 32306

1A Historical Perspective and Overview of Protein Structure Prediction

John C. Wooley and Yuzhen Ye

1.1 Introduction

Carrying on many different biological functions, proteins are all composed of one or more polypeptide chains, each containing from several to hundreds or even thousands of the 20 amino acids. During the 1950s at the dawn of modern biochemistry, an essential question for biochemists was to understand the structure and function of these polypeptide chains. The sequences of protein, also referred to as their primary structures, determine the different chemical properties for different proteins, and thus continue to captivate much of the attention of biochemists. As an early step in characterizing protein chemistry, British biochemist Frederick Sanger designed an experimental method to identify the sequence of insulin (Sanger et al., 1955). He became the first person to obtain the primary structure of a protein and in 1958 won his first Nobel Price in Chemistry. This important progress in sequencing did not answer the question of whether a single (individual) protein has a distinctive shape in three dimensions (3D), and if so, what factors determine its 3D architecture. However, during the period when Sanger was studying the primary structure of proteins, American biochemist Christian Anfinsen observed that the active polypeptide chain of a model protein, bovine pancreatic ribonuclease (RNase), could fold spontaneously into a unique 3D structure, which was later called native conformation of the protein (Anfinsen et al., 1954). Anfinsen also studied the refolding of RNase enzyme and observed that an enzyme unfolded under extreme chemical environment could refold spontaneously back into its native conformation upon changing the environment back to natural conditions (Anfinsen et al., 1961). By 1962, Anfinsen had developed his theory of protein folding (which was summarized in his 1972 Nobel acceptance speech): “The native conformation is determined by the totality of interatomic interactions and hence, by the amino acid sequence, in a given environment.”

Anfinsen’s theory of protein folding established the foundation for solving the protein structure prediction problem, i.e., for predicting the native conformation of a protein from its primary sequence, because all information needed to predict the native conformation is encoded in the sequence. The early approaches to solving this problem were based solely on the thermodynamics of protein folding. Scheraga and his colleagues applied several computer searching techniques to investigate the

1

2

John C. Wooley and Yuzhen Ye

free energy of numerous local minimum energy conformations in an attempt to find the global minimum conformation, i.e., the thermodynamically most stable conformation of the protein (Gibson and Scheraga, 1967a,b; Scott et al., 1967). The major challenge for an energy minimization approach to protein structure prediction is that proteins are very flexible; thus, their potential conformation space is too large to be enumerated. [Despite the huge space of possible conformations, that proteins fold reliably and quickly to their native conformation is known as “Levinthal’s paradox” (Levinthal, 1968)]. To address this issue, one needs an accurate energy function to compute the energy for a given protein conformation and a rapid computer searching algorithm. The progress of peptide molecular mechanics enabled the development of molecular force fields that described the physical interactions between atoms using Newton’s equations of motion. In general, the interactions considered in the force field include covalent bonds and noncovalent interactions, such as electrostatic interactions, the van der Waals interactions, and, sometimes, hydrogen bonds and hydrophobic interactions. The parameters used in these force fields were obtained through experimental studies of small organic molecules. On the other hand, many computational methods developed in the field of optimization theory and mechanics have been applied to the rapid conformation search. These fall into two categories: the molecular dynamics method and the Brownian dynamics (or stochastic dynamics) method. Both methods sample a portion of potential protein conformations and evaluate their free energy. Molecular dynamics samples the conformations by simulating the protein motion based on Newton’s equation, starting from an arbitrarily chosen protein conformation. Brownian dynamics, instead, uses Monte Carlo random sampling technique or its derivatives to evaluate protein conformations. Combining various force fields and conformation searching methods, many software packages were developed, such as AMBER (Pearlman et al., 1995), CHARMM (Brooks et al., 1983) and GROMOS (van Gunsteren and Berendsen, 1990), all aimed at using computing simulations to predict the native conformation of proteins.

Despite the great theoretic interest in energy minimization methods, these have not been very successful in practice, because of the huge search space for potential protein conformations. In 1975, Levitt and Warshel used a simplified protein structure representation and successfully folded a small protein [bovine pancreatic trypsin inhibitor, (BPTI), 58 amino acid residues] into its native conformation from an open-chain conformation using energy minimization (Levitt and Warshel, 1975). Little progress, however, has been made since then; the simulation usually takes an unrealistic compute or run time, and the final prediction is not very satisfactory. For instance, in 1998, Duan and Kollman reported a simulation experiment of one small protein (the villin headpiece subdomain, 36 amino acid residues), running on a Cray T3D and then a Cray T3E supercomputer, that took months of computation with the entire machine dedicated to the problem (Duan and Kollman, 1998). Even though the resulting structure is reasonably folded and shows some resemblance to the native structure, the simulated and native structure did not completely match. Currently, energy minimization methods are largely used to refine a low-resolution initial structure obtained by experimental methods or by comparative modeling (Levitt and Lifson, 1969).