Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Determination of Complex Reaction Mechanisms.pdf
Скачиваний:
45
Добавлен:
15.08.2013
Размер:
7.63 Mб
Скачать

MINI-INTRODUCTION TO BIOINFORMATICS

209

is plotted along the axis of rows. The changes in expression levels, as measured by fluorescence, are indicated by colors, for example green for decreased expression, black for no change in expression, and red for increased expression. Responses in expression levels have been measured for various biochemical and physiological conditions. We turn now to a few methods of obtaining information on genomic networks from microarray measurements.

13.1Clustering

Clustering is the grouping of genes into hierarchical functional units based on similarities—correlations—in expression patterns. We used this method in chapter 7 (see figs. 7.6 and 7.9 and the associated discussion). An example of clustering is given in fig. 13.1, taken from [2]. Clustering is a useful method of visualization of large banks of data rather a method of reverse engineering, that is, the deduction of gene networks from measurements. There is the implication in clustering that genes that are expressed in similar patterns are mechanistically related, or perform related biological functions. Both this assumption and the one of hierarchical function may in some cases, but not all, be valid. There is evidence that gene regulatory networks are not hierarchical but are rather interwoven, as is the case for metabolic and protein networks.

13.2Linearization in Various Forms

The kinetic equations governing gene expression can be expected to be highly nonlinear and therefore difficult to investigate. Linearization of the kinetic equations is tempting and certainly facilitates matters. A number of investigations [3–5] have studied such an approach and we discuss briefly one of them.

Young et al. [4] propose a reverse engineering method in which they start with general chemical kinetic equations (see eq. (11.1)) and then linearize these equations (see eq. (11.3)) for the concentrations of each mRNA in the system: the time variation of each mRNA is a linear function of its own concentration; a linear function of all other mRNA concentrations with coefficients that give the type (positive, negative, or zero) and strength of interaction between two mRNAs; a constant perturbation term (external stimulus); and a noise term dependent on time only. The coefficients form the connectivity matrix. The system of these N mRNAs is originally presumed to be in a stationary

Fig. 13.1 Clustering display of data from time course of expression by stimulation of primary human fibroblasts. Serum fibroblasts were grown in culture, deprived of serum for 48 hours; serum was then added and samples were taken at time intervals from 0 to 24 hours. Microarray data was taken from the expression of about 8,600 genes. All measurements are relative to time 0. The color scale ranges from saturated green for log ratios −3.0 and below, to saturated red for log ratios 3.0 and above. Each gene is represented by a single row, and each time point is represented by a single column. Five clusters are indicated by colored bars. The genes in A are involved in cholesterol biosynthesis, in B in the cell cycle, in C in the immediate-early response, in D in signaling and angiogenesis, and in E in wound healing and tissue remodeling. (The picture and a part of the caption are taken from [2], with permission.) (See color insert.)

210 DETERMINATION OF COMPLEX REACTION MECHANISMS

Fig. 13.2 Schematic of a one-dimensional gene network with cascade structure. (Taken from [4], with permission.)

(nonequilibrium) state, from which it is perturbed a small amount so that the responses of all mRNAs, measured by microarrays, are linear. A measurement thus consists of applying the chosen perturbations to each mRNA and measuring the responses of N concentrations of mRNA.

It takes more than one measurement to obtain solutions of the linearized kinetic equations, and suppose that M such measurements have been made so that the solutions yield the connectivity matrix. The positive matrix elements indicate activation of one mRNA by another, the negative elements yield repression of one mRNA by another, and the zero elements indicate no interactions. To obtain a unique solution the authors propose to choose among all possible solutions for the connectivity matrix by seeking the most sparsely connected network, the one with the smallest number of interactions among the mRNA. This yields a unique solution to the connectivity among the N species, but no justification is given for this particular choice. The method is tested on several model systems. They start with linear networks (see fig. 13.2) and consider 100, 200, 300, and 400 genes. For small numbers of measurements, M = 20, the errors made on finding connections between genes can be very large—in the thousands—but for each of these systems the errors are zero for M = 70. That is a large number of measurements; but the method as outlined works for small perturbations but not for large ones, for then the linearization is not applicable.

The concept of “sparse networks” is crucial to the success of this approach, but the authors were “unable to quantify this notion of sparseness and to pinpoint the critical size of a network as a function of the average number of connections” for which the method suffices. The method is not applicable for obtaining the connectivity of small, and likely highly connected, networks that govern specific biological functions.

An important consideration has been omitted in [3–5], which are devoted to this approach, and that is the usually rather large experimental errors associated with microarray measurements. It is important to know how such errors propagate in the calculations and their effects on the proper identification of the connectivity matrix. A simple example worked out in detail in section 12.5 shows the possible multiplicative effects of such errors in nonlinear kinetic equations. Until this problem is addressed, the approach of linearization must be viewed with caution.

13.3Modeling of Reaction Mechanisms

We have described the modeling of reaction mechanisms for chemical systems in chapter 1. There we indicated the use of results of available experiments in regard to determining stoichiometries of elementary reactions and rate coefficients: the use of mass action kinetics, use of enzyme reaction mechanisms, guessing of missing reactions, and estimation (or even wild guessing) of missing rate coefficients. With this information

MINI-INTRODUCTION TO BIOINFORMATICS

211

a reaction mechanism may be formulated, predictions from it may be compared with experiments, and new experiments may be suggested.

Some tours de force of these methods have been presented in several publications, (see [6,7] and references therein). The studies of Tyson and coworkers are focused on the kinetic analysis of the budding yeast cell cycle. The molecular mechanism of cell cycle control is known in more detail for budding yeast, Saccharomyces cerevisiae, than for any other eukaryotic organism. Many experiments have been done on this system over many years; there are about 125 references cited in [6]. The biological details are second to stressing the enormity of this task. The model has nearly twenty variables and that many kinetic equations, and there are about fifty parameters (rate coefficients, binding constants, thresholds, relative efficiencies). A fair number of assumptions need to be made in the cases of absence of any substantiating experimental evidence, and a fair number of approximations need to be made to simplify the kinetic equations. The complexity of this system is indicated in fig. 13.3 and its caption.

The authors make an extensive effort to compare with experiments, and there is substantial agreement (partly fitted) on variations in time during the cell cycle for several key species, both for the wild type and some mutants, and on the dependence of cell cycle time on growth rate and birth size, among others. For a large-scale modeling study of the microbial cell, see [7].

13.4Boolean Networks

Suppose we simplify the expression of a gene as either “on” or “off.” In either state it may affect another gene. The “on”/“off” nature may be modeled by a Boolean function which has the value of 1 = “on” or 0 = “off.” In fig. 13.1, recall that the color red signifies increased expression and the color green signifies reduced expression. Hence we can think of fig. 13.1 being altered in this Boolean, binary simplification by substituting 1 for green and 0 for red (where such changes are scored as statistically significant).

Let us give a simple example of a Boolean network, taken from [8], shown in fig. 13.4. Part (a) of that figure shows the connectivity of the network; part (b) gives the logical (Boolean) rules; and part (c) the state transition table which results from the network structure and the logical rules. For example, consider the second line in the transition table, and we reason from the primed letters to the unprimed: If A = 0, then B = 0; if B = 0, then A or C = 1; if C = 0, then A and B = 0. The task of reverse engineering is to find the network and rules that produce the observed transitions as displayed in fig. 13.4.

One way to proceed is to guess a Boolean network for the number of genes for which there are expression experiments, with input of known connectivities from prior experiments. Next, guesses need to be made of the logical rules, and here again the guesses may be educated by inspection of the experiments or given by random choice. Restrictions may be made, a priori, on the number of gene inputs on each gene. The calculations of a transition table for a given network and rules are fast due to the binary nature of the network. The calculated transition table can be compared visually with the experiments, which may give some hints on changes to be made in the network and the rules. However, a quantitative measure of agreement of the calculation with experiments is desirable, and such a measure is provided by information theory.

Fig. 13.3 Molecular model of the control of CDK (cyclin-dependent kinase) activities during the budding yeast cell cycle. We lump together redundant cyclins (Cln1 + Cln2 = “Cln2,” Clb1 + Clb2 = “Clb2,” and Clb5 + Clb6 = “Clb5”) and ignore Clb3 and Clb4. (Notice that we do not draw the kinase subunit, Cdc28, that is associated with each cyclin, because we assume it is in excess.) At the beginning of the cycle, the cell has few cyclin molecules, because the transcription factors (SBF, MBF, and Mcm1) that regulate cyclin synthesis are inactive. Clb-dependent kinases, in addition, are suppressed by a stoichiometric inhibitor (Sic1) and by efficient proteolysis of cyclin subunits. Cln3/Cdc28, which is present at low and nearly constant activity throughout the cycle, triggers a sequence of events leading ultimately to cell division. The sequence can be read from left to right. When the cell grows to a sufficiently large size, Cln3/Cdc28 and Bck2 activate SBF and MBF (by phosphorylation, so we assume), causing Cln2 and Clb5 to begin to accumulate. At first, Clb5 accumulates in inactive trimers, Clb5/Cdc28/Sic1, but Cln2/Cdc28 is not so inhibited. Rising Cln2/Cdc28 activity plays three important roles. First, it initiates bud formation. Second, it phosphorylates Sic1, priming the inhibitor for ubiquitination by SCF and ultimate degradation by the proteasome. Third, it inactivates Hct1, which, in conjunction with APC, was responsible for Clb2 degradation in G1 phase. When Sic1 is destroyed, Clb5/Cdc28 activity rises abruptly and drives the cell into S phase. These are the major physiological events associated with the Start transition. With Sic1 gone and Hct1 inactivated, Clb2-dependent kinase can begin to rise, with some lag, because Clb2/Cdc28 activates its own transcription factor, Mcm1. In addition, Clb2/Cdc28 inactivates SBF, so Cln2-dependent kinase activity begins to fall as Cln2 synthesis shuts off. At about the same time, MBF is inactivated, and the Clb5 level starts to fall. Rising Clb2/Cdc28 activity induces progress through mitosis. The metaphase–anaphase transition is regulated by a pair of proteins, Cdc20 and Hct1, that target substrates to the APC for ubiquitination. At metaphase they are inactive, but when DNA is fully replicated and all chromosomes are aligned on the metaphase plage, Cdc20 is activated. Indirectly Cdc20 promotes

(1) dissociation of sister chromatids (anaphase A); (2) activation of Hct1, which conducts Clb2 to the APC, thereby initiating anaphase B and cell separation; and (3) activation of Swi5, the transcription factor for Sic1. With all CDK activity gone (except for a little associated with Cln3), Sic1 can make a comeback, and the cell returns to G1. (Figure and caption taken from [6], with permission.)

Соседние файлы в предмете Химия