Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Physics of biomolecules and cells

.pdf
Скачиваний:
52
Добавлен:
15.08.2013
Размер:
11.59 Mб
Скачать

452

Physics of Bio-Molecules and Cells

in energy of discrimination; in our case the energy of the intermediate state is the same, we must proofread on the basis of entropy. But of course this distinction is illusory: while a forward arrow is an energy and a backward arrow an entropy, looking at the diagram in reverse simply exchanges them.

Fig. 5. The kinetic proofreading model for topoisomerases. Notice all rates are mirror-symmetric except for the crossing attempts, which are given by κ, κ , ν and ν . Notice also that the path from right to left crosses both κ and κ . From [14].

This diagram must be complicated a little bit because the state S going from K to U is distinct from the state S going from U to K; this distinction comes about because the topo grabs one DNA segment first, and allows the second segment to cross, but cannot chemically allow it backwards: its clamp has an entrance and an exit which are distinct.

In order to proofread we must do something twice that we were doing only once before. What we were doing once is that a DNA segment bangs against a second segment in an attempt to cross it; there happens to be a topo sitting there, and it lets it through. So we need to allow two bangs, two crossing attempts; but we couldn’t possibly if the topo lets the segment through on the Þrst try. So the proofreading model simply says: the topo will insist on getting two attempts at crossing (within a small time period) before allowing it.

Thus the enzyme must be able to count to two, and so needs a physical substrate for a one-bit memory. There’s of course a large number of di erent

M.O. Magnasco: Three Lectures on Biological Networks

453

ways that bistability can be built into such a system: enzymes can have more than one conformational state (like ion channels) or can be subject to reversible posttranslational modifications, like phosphorylation. This “bit” has to be strongly coupled to other mechanical properties of the enzyme, since it is on the basis of this bit that the second strand is allowed passage or isn’t. In this particular case, the “bit” need only exist during the duration of the enzyme-DNA complex, since the proofreading scheme does not require memory “across” instances of the complex. Furthermore, the chemistry of the topo itself already has a “bit” of information, though unused: whether the covalent backbone of the segment has been cut or not. Thus one possible physical embodiment of this model proposes itself: the “bit” is whether the segment has been cut or hasn’t, and so the proofreading translates to stating that segment cutting is triggered by a crossing attempt. A further attractive feature is that if the “high energy state” of proofreading is the DNA segment having been cut, then evidently there’s no strand crossing through the low energy state–the segment is still uncut; and there is a need for a γ built in, since we do not want to leave cut DNA lying around too long. Thus this particular implementation of proofreading is attractive because of the simplicity with which everything falls together; but Occam’s razor is dangerously blunt in biology so we should not make that much of it.

There is a regime in which the choice of all the rates becomes immaterial; the analytic solution to this diagram is

Pknot

= νν γ(λ + µ) + κ µ

Punknot

 

κκ

 

γ(λ + µ) + ν µ

so when γ and λ are much larger than the other quantities the resulting ratio becomes κκ /νν which is Peq2 , the square of the topological equilibrium probability. All the rates have fallen o the equation! Figure 6 shows then the agreement between this model and the data.

In other words: the model, while still local in space, by insisting on receiving two independent crossing attempts has become nonlocal in time.

1.10 Suppression of supercoils

An interesting aspect kinetic proofreading is that the proofreading property is a function of the network of reactions, and not of any individual reaction. Furthermore, this network works out to have a neat property: the squaring of probabilities looks exactly like squaring the Boltzmann distribution, which can be done by doubling the energies or halving the temperature. In any case, even though the internal arrows of the diagrams are out-of- equilibrium, the diagram as a whole works out to be a pseudoequilibrium, detailed-balance-respecting gizmo.

454

 

 

 

 

Physics of Bio-Molecules and Cells

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fig. 6. The proofreading model predicts a quadratic improvement. This plot shows the knot data from [5] graphed against the square line. Please note that this is not a fit since the solution of our model has no free parameters. From [14].

Fig. 7. An infinite stack of linking number change reactions, and how to proofread them. From [15].

Therefore, if all of the relevant reactions are proofread, we can then

kinetically proofread an inÞnite stack of reactions and have it work out as if it was still in detailed balance. This is the easiest out-of-equilibrium kind of system one can conceivably get. Notice that because of the

M.O. Magnasco: Three Lectures on Biological Networks

455

temperature-halving analogy, the moment one reaction is not proofread we have a system with two equivalent temperatures, and we’re stuck with a full, hideous, out-of-equilibrium, probably-intractable model.

But if all the reactions in an infinite stack are proofread, we just need to square the equilibrium probabilities and we’re done. This is the case for linking numbers and supercoiling. In the range of the experiments, the supercoiling energy looks just like a parabolic potential deviation from some minimum: E = ED(Lk − Lk0)2/2 where ED is the supercoil discrimination energy and Lk the linking number. Thus the probabilities for the equilibrium model would be Gaussian, and their squares would also be Gaussian, with exactly the same energy form except that the width of the Gaussian distribution will be reduced by a factor of 2, or, equivalently, the supercoil discrimination energy would be doubled. Interestingly, the supercoiling data from Rybenkov et al. [5] shows quite a good Gaussian behaviour, which can be fit to a supercoiling discrimination energy which is about 1.9 times the actual one.

Thus we can say that all the quantitative data currently in existence is compatible with a kinetic proofreading model; furthermore, we cannot emphasize enough that in this model there is no freedom to fit anything, so we cannot dismiss the agreement between the model and the data on the basis of the scant number of datapoints.

1.11 Problems and outlook

The only competing model to ours which we are aware of has been championed by Alex Vologodskii. As mentioned before, the idea is that topos bend the segment to be cut into a hairpin shape; since they only allow passage of the second segment from the inside to the outside of the hairpin, this is an out-of-equilibrium reaction capable of recognizing knottedness. This model has currently two problems: first, there is no analytical treatment showing the model to be capable of the amount of knottedness suppression shown by both experiment and our proofreading model. So even though it sounds plausible that such a mechanism might distinguish knots from unknots, it is not yet known if it can agree with the data. Second, it is unclear why it should suppress supercoiling fluctuations, and Volovodskii’s team has not studied this issue. On the other hand, in support of the model, it has been shown that EM pictures of topos attached to DNA bend the DNA locally; which as evidence is somewhat slim, since almost anything that binds DNA will bend it, especially after freezing to produce EM pictures.

Vologodskii has raised in turn a serious objection to our model. Noticing that the model depends crucially on the assumption that the first and second crossing attempts are of the same topological type, he claims that

456

Physics of Bio-Molecules and Cells

Fig. 8. Comparison between the theory and the data. These plots should be straight lines if and only if the distributions are Gaussian. The slope of the line reflects the width of the Gaussian distribution, which in turn is the supercoiling discrimination energy. The three datasets are the equilibrium distribution, the proofreading model (in which we just multiplied the equilibrium distribution by two, and the experimental dataset, which is in quite close agreement with the model). Once again, this is not a fit, for the model gives a parameterless function with no freedom for fitting. From [15].

this implies that the vast majority of crossing attempts must be topology changing for our model to hold.

His computation of crossing attempt rates shows that the vast majority of such attempts do not change the knottedness state. This part is of course true. Given a circular polymer of only a few hundred persistence lengths, the primary crossing attempt is when the polymer acquires a figure-eight shape; or perhaps I should say “hourglass shape” to avoid confusion with the figure eight knot. Crossing in that state does not lead to knotting–but it is the primary means of changing supercoil number as in Figure 2, and is taken into account in the supercoiling analysis of the previous section, which agrees in detail with the experimental data. Second, the regime in which our knotting analysis is correct is the limit in which the two crossing attempts are close to one another because the de-excitation rate γ is large. Of course, the time required for the polymer to change conformation from a trefoil-like conformation as shown in the figures to an hourglass should

M.O. Magnasco: Three Lectures on Biological Networks

457

be quite bigger than the time between two successive crossing attempts in either state.

1.12 Disquisition

I like this model particularly because of two reasons. The first one, is that we have an implementation of a rather precise function in molecular biology which is not being carried out by an enzyme which does exactly this or that– it is the outcome of the dynamics of a network of chemical reactions, and not a result of any individual reaction.

Second, because the model presented here could very well be wrong. It may sound strange, but the thing that I personally miss the most from my life as a physicist is the ability to be wrong, which stems from the ability to make a model possessing unambiguous predictions which can be checked against experimental reality without arguing room. Our problem is such: topoisomerases either wait for two bangs or they don’t, and this can be, in due time, checked. Rarely this happens in the interaction between physics and biology, because rare is the time when theory can take a leap in front of experimentation–most usually one is left fitting experimental data with models which have seventeen parameters too many.

2Gene expression networks. Methods for analysis of DNA chip experiments

I will give a fast and loose description of the regulation of gene expression, gene chip technology etc. This introduction is meant to whet the appetite of the physicist considering studying this fascinating branch of technology; it is deliberately fast and loose, so much that biologists may feel annoyed by the lack of precision. The reader interested in continuing the study of this subject are well advised to get into a real textbook on genes and reading the many reviews on the subject of gene chips, like [18–20].

2.1 The regulation of gene expression

Cells encode in their genes proteins which carry out the various tasks required to stay alive, be it to digest foods, detoxify dangerous chemicals, or sense and process information. They adapt themselves to circumstances by changing the amounts or even the kinds of proteins which they deploy; a first line of intervention to change these levels is to change the amount of RNA transcription for a given gene.

transcription translation

DNA

RNA

protein.

458

Physics of Bio-Molecules and Cells

A gene, in the genetics sense, is a unit of heredity. There are various inequivalent ways to define such things and the details gets messy if one tries to be rigorous, so we won’t try to here. In common parlance, a gene is a “functionally meaningful” region of the genome, and as such defined by its sequence and its position within the chromosome; it has a “coding” region, which is the one in which the code for the gene product (a protein) is spelled out; around the coding region there are “control” regions in which various little sequence snippets act as landing pads for the elements of the transcription machinery and its regulatory entourage. The latter consists of various proteins which either enhance or diminish the chances of getting the transcriptional machinery to transcribe the gene; these are the activators and repressors of transcription. Coding and control refer, in this context, exclusively to the transcriptional process, for the copy of the coding region which we call the RNA transcript contains various control elements for everything that happens later.

So, through transcription, various substrings of the DNA sequence get copied each into individual RNA pieces; the little pieces of RNA for various di erent genes then float around, in various abundances, and get shuttled around, processed by splicing and other alterations before being used for translation; any particular gene may have from zero or one RNA transcripts to thousands of identical transcripts in any given cell. The transcript abundance for a given gene is established by the competition between two processes: transcription generates more transcripts, while RNA degradation destroys them. Degradation is less specific than transcription, but not unspecific: the RNA transcripts contain sequences which target them for degradation at various rates. Some transcripts are very rapidly transcribed and degraded, establishing thus a non-equilibrium steady state that can be controlled on very fast timescales. This is the case with various informationprocessing enzymes like kinases, whose transcripts have half-lives in the 0.2 to 2 hour range. There are various transcripts which are very slowly degraded, like those of various ion pumps and the like, whose half-life may be days.

The process is similarly complex for the rest of the diagram, i.e., protein synthesis. Zooming in on any portion of this diagram would reveal many complexities we have glossed over. The one piece we shall need to concentrate on is the region above mRNA: the control of mRNA abundance through the control of the arrows around it.

This picture suggests a dynamical network of control for any particular gene. One should imagine a diagram like the above for every single gene; these diagrams are then strung together by various interactions, because all arrows in all such diagrams are e ected by proteins which are themselves controlled and may be in various states of activation etc.

M.O. Magnasco: Three Lectures on Biological Networks

459

 

 

 

 

 

 

 

 

 

DNA

transcriptional

 

 

 

 

 

 

 

 

transcription

control

 

 

 

 

complex

 

 

 

 

 

bound to DNA

transcription

 

 

 

 

 

 

 

 

 

RNA transcript

splicing etc.

 

 

 

 

 

RNA

mRNA

 

 

 

turnover

 

initiation of

 

 

 

 

mRNA in

translation

 

 

degradation

 

ribosome

translation

 

 

 

 

 

of misfolds

peptide chain

processing

 

 

 

 

 

protein

protein

and folding

 

 

turnover

 

 

 

 

The function of a given protein may be to cut one specific bond in a particular sugar, in which case we call it an enzyme; or it may be to bind to the little snippets of DNA sequence and help or prevent the assembly of the transcriptional machinery, thereby inducing or repressing expression of the gene where the snippet lies. In the latter case we call it a transcriptional regulator or transcription factor. Gene regulation networks are the networks of interactions caused by all proteins which are transcriptional regulators of other proteins (perhaps including themselves), of which there is a fair amount. Since a transcriptional regulator can regulate many genes and any gene may be (and usually is) regulated by many factors in a combinatorial way, the transcriptional network is capable of sophisticated behaviour; since regulators may be activated or inactivated through processes like phosphoryllation by elements outside the transcriptional network proper (like protein kinases) the network reacts to outside inputs. Regulation of genes not in the network is then the output of this network.

All long-term changes in living beings are mediated through transcriptional regulation programs. The di erentiation of genetically identical cells into distinct cell types, like liver cells or muscle cells, is mediated through flip-flop-like switches of transcriptional regulation; the long-term changes to synapses in brain cells that mediate our memory are supported through gene regulation circuits, as will be described in the third part of this course.

460

Physics of Bio-Molecules and Cells

2.2 Gene expression arrays

Traditionally it has been extremely laborious to figure out pieces of transcriptional regulation circuits; even now, when in possession of the whole genome sequence, data mining techniques have failed to pop things out brightly, but rather provide long lists of possible suspects to be confirmed by the slow traditional methods [21]. We do not know the whole complement of transcriptional regulator binding sites, we do not know any precedence rules stating, if an activator and a repressor are active at the same time, which one prevails, or, obviously, any of the higher-order combinatorics. The picture is complex because even history e ects have to be taken into account. The set of binding sites for the transcription factors for a given gene is more than a “logic gate” reacting instantaneously to the inputs, since the chemistry of binding permits, for instance, history e ects: overlapping binding sites exclude simultaneous binding by the respective proteins, in which case the factor which was turned on first will bind, and prevent the one turned last from binding. Finally, it bears mentioning that fluctuations are an essential part of this picture since we are not dealing with a mass-action system here: these are single-molecule systems virtually by definition.

It would be interesting to attempt to start reverse-engineering the circuits from measurements of the behaviour, on the assumption that genes whose expression is temporally correlated have a large chance of being coregulated. This assumption is naive, but the best shot we currently have at a problem whose complexity is otherwise overwhelming, and whose importance overshadows most of Biology.

Gene expression arrays are solid surfaces onto which pieces of DNA have been attached in spatial patterns. This pattern is arranged as an array of regions or spots; within a spot the DNA is chemically homogeneous. Spots may be 1–100 µm and contain many millions of identical copies of DNA. When a drop of fluorescently labeled RNA is placed on such a surface, the individual RNA molecules will bind to the DNA complements attached to the surface. The binding will be sequence-specific: the RNA is expected to bind extremely well to its exact complement, while little or not at all to completely di erent sequences. The drop is then washed away together with any unbound pieces of RNA. When the array is viewed under a fluorescence microscope, the spots will glow in direct proportion to how many pieces of fluorescently-labeled RNA are bound to its DNA. A measurement of spot fluorescence is then a proxy for a species-specific measurement of RNA concentration.

There are two main kinds of array in existence, following a divide between the do-it-yourself approach versus the ready-made approach, that somehow mimics similar divides in other areas like operating systems

M.O. Magnasco: Three Lectures on Biological Networks

461

(Windows vs. Linux for example). For a clear introduction (aimed mostly at the biological public) see [18].

 

 

spot arrays

GeneChip (R)

 

 

 

 

who

 

do it yourself

buy if from A ymetrix

for

 

your favorite animal,

commercially important

 

 

your favorite tissue

animals (human, rat...)

what

 

full-length cDNAs

short ( 25) bp DNA

 

 

on a glass surface

oligos on a glass surface

how

 

deposit a drop from

photolithographic

 

 

a test tube and let dry

chemical synthesis

how

 

as many as you have

as many as will fit in the

many

 

patience for:

wafer at given feature size:

 

 

100 10 000 cDNAs,

500 000 features at (30–40)

 

 

hopefully distinct

features per gene (now)

source

 

you make the library

sequence database

cost

 

with a library, 2 $ each

about 2000 $ each (Fedex’ed)

 

 

+ your copious time

minus university discounts

Spotted arrays are home-made. Their popularity took a great boost when Pat Brown’s group published (open-source style, [24]) the plans and specs for a robot device costing about 13 000 $ that would make batches of hundreds of arrays from cDNA libraries in standard 384-well plates. The robot operates simply by dipping a small array of metal pins into the little tubes, and then impacting the pins upon a glass slide (the same kind used for microscopes). The robot repeats the operation through an array of glass slides, then changes the tubes, until the collection of tubes is exhausted. As the fluid droplets dry, the cDNA from the libraries dries on the glass and somehow bonds to the surface. At any given research institution, it’s likely that one of these robots will have already been built at some central facility (or at a nearby institution), so the main expense is the creation, normalization and curation of the cDNA library. This is an arcane branch of black magic so we shall not dwell upon it here–just remark two important points. A cDNA library contains cloned pieces from the expressed mRNA in the cells of the tissue/animal the RNA was purified from. As such, the sequence is unknown, so the data is labelled by spot number and eventually points to a test tube. If something interesting is inferred–well, a bit will be drawn from the tube and get sequenced. The other point is that one may end up with multiple copies of the cDNA clone in many di erent tubes; which is not known in advance. A process called “normalization” attempts to factor out relative mRNA abundance, but it introduces noise and fragmentation into the collection. Another process called “substraction” attempts to make

Соседние файлы в предмете Химия