Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

The Elisa guidebook

.pdf
Скачиваний:
217
Добавлен:
15.08.2013
Размер:
7.06 Mб
Скачать

Page 303

1.3.1¡ª

Samples for Feasibility Studies: Serum Controls (Standards)

Developments in ELISA rely on the availability of some reagents of relevance to the problem at hand. Thus, other tests can provide information, e.g., about antibodies measured in a serum, which allows them to be used in ELISA development. In the case of the indirect assay, we are trying to estimate antibodies against a specific agent through their binding to that agent (on a plate), and subsequent detection with an antispecies conjugate. We are also trying to produce an assay that can differentiate between samples containing (positive) and not containing (negative) antibodies.

The availability and selection of four or five samples (positive sera in our example) that range from high to low levels of antibodies against the infection/ infectious agent in question is quick useful. The availability of such samples relies on a continuity of work at a given institution or their preparation in animals with specific disease agents. In addition, a sample(s) containing no antibody is required. Such control positive and negative samples should be taken, wherever possible, from known infected or uninfected samples from a representative population of animals for which the eventually validated assay will be applied (target population). Preferably, the samples should have given expected results in one or more serological techniques other than the one being validated. These same samples are used to optimize reagents throughout the feasibility studies. The samples are preferably taken from individual animals but they may represent pools of samples from several animals. A good practice is to prepare or obtain a significant volume (e.g., 10 mL) of each sample and divide it into 0.1-mL vol, to be stored at ¨C20¡ãC. One volume of each is thawed, used for experiments, and stored at 4¡ãC between experiments until depleted. Then, another aliquot may be thawed for further experimentation.

This procedure aims to provide the same sample source of sera, in which the same number of freeze/thaw cycles is maintained for all experiments. This precaution is a strong element in reducing any variation that may be introduced, since freeze thawing can denature protein and hence antibodies. Excess shaking of samples is also to be avoided since the shearing action in overvigorous mixing also denatures protein. Shaking also causes frothing (excess bubbles), which produces partitions of proteins so that antibody may be enriched in the bubbles (hence, depleted in the main volume of liquid).

Care is necessary to ensure that samples taken from the freezer are mixed thoroughly, because freezing causes the protein content of serum to separate to the bottom of tubes. Basically, samples (including test samples) should be treated gently.

Note that the qualitative nature of antibodies making up a serum may be altered greatly even though the quantity of antibody measured appears, by some

Page 304

tests, to stay the same. As an example, shaking may destroy a high-affinity IgM population of a serum, allowing a more stable but lower-affinity IgG population to react with a target antigen. The net result (titer) may not alter in a specific assay, but the overall avidity of the serum has.

Conversely, other test systems may detect the drop in IgM, or increase in IgG, and hence show great alterations in respective titers. Note also that the problems with physical handling are more acutely important when levels of antibodies are low since there is a low level of positive antibody (protein), and small amounts of denaturation can turn a weak positive control into a negative one.

The approach of using the same sera has the added advantage of generating a data trail for the repeatedlyrun samples. After the assay is validated, one or more of the samples can become the serum control(s) that may be the basis for data expression and repeatability assessments both within and between runs. They may also serve as serum standards if their activity has been predetermined by other accepted methods; such standards provide assurance that runs of the assay are producing accurate data.

1.3.2¡ª

Expression of Data

The method use to normalize and express data should be decided preferably no later than at the end of the feasibility studies. Comparisons of results from day to day and among laboratories are most accurate when done using normalized data. For example, in ELISA systems, optical density (OD) values are absolute measurements that are influenced by ambient temperatures, test parameters, and photometric instrumentation. Therefore, results need to be calculated and expressed as a function of the reactivity of one or more serum control samples that are included in each run of the assay.

Classically, normalization of data is accomplished in indirect ELISA by expressing OD values in one of several ways¡ªe.g., by expressing the OD values as a percentage of a single high-positive serum control that is included on each plate. This method is adequate for most applications.

More rigor can be brought to the normalization procedure by calculating results from a standard curve generated by several serum standards. This requires a more sophisticated algorithm such as linear regression or log-logit analysis to calculate the normalized value for each test sample. These approaches are more satisfactory because they do not rely on only one high-positive control sample for data normalization, but, rather, utilize several serum controls to plot a standard curve from which the sample value is extrapolated. This allows for some experimental error correction; for example, if one of the control samples was omitted or gave a high variation from the expected value, then the test may be accepted provided the other controls were acceptable.

Page 305

Whatever the type of assay, it is essential to include additional controls for any reagent that may introduce variability and thus undermine attempts to achieve a validated assay.

1.3.3¡ª

Repeatability: Preliminary Estimates

Evidence that an assay is repeatable is necessary for further development. This is accomplished by calculating the intra-and interplate variation using the same samples run in different plates and on different days (and with different operators). Ideally, such tests should be run on at least 10 plates on 10 separate occasions. Coefficients (CVs) of variation (standard deviation [SD] of replicates of mean of replicates of equal to or less than 15% for the raw OD values indicate adequate repeatability at this stage of assay development. Such data obtained on a number of different plates and days also allows confidence limits to be ascribed to the variation observed (comparison of different means with their respective variations). However, if there is evidence of excessive variation (>20%) within and/or between runs of the assay, more preliminary studies should be made. This either will confirm that stabilization of the assay is possible or will determine ultimately whether the test format should be abandoned. This is extremely important because an assay that is inherently variable has a high probability of not withstanding the rigors of day-to-day testing on samples from the targeted population of animals.

1.3.4¡ª

Choice of Optimal Assay Parameters

Optimal concentrations/dilutions of the antigen adsorbed to the plate, serum, enzyme-antibody conjugate, and substrate solution are determined through chessboard titrations (CBTs) of each reagent against all other reagents after confirming the best choice of reaction vessels (usually evaluation of two or three types of microtiter plates, each of which has different binding characteristics).

Additional experiments determine the optimal temporal, chemical, and physical variables in the protocol, including incubation temperatures and durations; the type, pH, and molarity of diluent, washing, and blocking buffers; and equipment used in each step of the assay (e.g., pipettors and washers that give the best reproducibility). There are numerous publications detailing the reagents and protocols available for assay development. Often these publications give examples of assays dealing with similar antigens and species of sera being examined.

1.3.5¡ª

Analytical Sensitivity and Specificity

Experiments to establish the analytical sensitivity of the assay (the smallest detectable amount of the analyte in question) and the analytical specificity (the degree to which the test does not crossreact with analytes associated with

Page 306

other infections) are needed. Note that sensitivity and specificity here are not strictly the same as when being considered in a purely immunological way, but are an attempt to quantify the "detection" level of any assay (sensitivity) that is affected by unwanted crossreactivity (specificity considerations).

Analytical sensitivity can be assessed by end point dilution analysis, which measures the dilution of serum at which antibodies are no longer detectable. Analytical specificity is best assessed by examining test performance using a panel of sera derived from animals that have experienced related infections that may stimulate crossreactive antibody. If, e.g., the assay does not detect antibody in limiting dilutions of serum with the same efficiency as other assays, or crossreactivity is common when sera from animals with closely related infections are tested, the reagents need to be recalibrated or replaced, or the assay abandoned.

1.4¡ª

Determining Assay Performance Characteristics

When feasibility studies indicate that an assay has potential for field application, the next step is to characterize the assay's performance characteristics. Estimates are needed of diagnostic sensitivity (D- SN) and diagnostic specificity (D-SP).

D-SN is the proportion of known infected reference animals that test positive in the assay; infected animals that test negative are deemed false negative results. D-SP is the proportion of uninfected reference animals that test negative in the assay; uninfected animals that test positive are deemed false positive results. The number and source of reference samples used to derive D-SN and D-SP are thus of paramount importance if the assay is ever to be properly validated for use in the general population of animals targeted by that assay.

These are primary parameters obtained during validation of an assay. They are the basis for calculation of other parameters from which inferences are made about test results. It is important that estimates of D- SN and D-SP be as accurate as possible. Ideally, they are derived from testing a series of reference samples from reference animals of known infection status relative to the disease or infection in question.

1.4.1¡ª

Intended Use of the Assay

Determination must be made of how many reference samples must be tested in order to achieve statistically significant estimates of D-SN and D-SP with an acceptable error. This depends on the intended use of the assay. When a screening test is needed for application to a highly pathogenic disease, the threshold that separates seropositive from seronegative animals can be set at a low level, so that it is unlikely that any infected animals will be misclassified as uninfected. However, a consequence of the low threshold is that uninfected

Page 307

animals showing nonspecific activity will be misclassified as infected. This will directly contribute to a lowering of assay specificity.

Alternatively, if the test is for a highly endemic but less pathogenic disease, generally the threshold can be set relatively high because it is important that the test not classify an animal as infected when in fact it is uninfected. Because of its high specificity, such a test is often used as a confirmatory test. Having determined whether high sensitivity or high specificity is the primary requirement for the assay, it is theoretically possible to calculate the number of samples required to establish valid estimates of D-SN and D-SP.

1.4.2¡ª

Size of Reference Serum Panel Required for Calculations of D-SN and D-SP

The optimal way to determine D-SN and D-SP of any assay is to test a large panel of reference sera that represents two groups of animals. One group should be proven to be infected with the agent in question. The second group should be known to be free of infection. In theory, the number of infected animals tested to achieve the desired diagnostic sensitivity of the test (¡À allowable error) can be approximated by the following formula:

in which n = the number of animals that need to be tested in the new assay; ds = the diagnostic sensitivity that is sought (i.e., the expected proportion of infected animals that will test positive); and e = the amount of error allowed in the estimate of diagnostic sensitivity.

For instance, if a 95% diagnostic sensitivity is desired with ¡À5% error allowed in that estimate, the theoretical number of animals that is needed in the test validation = { [4 ¡Á 0.95 ¡Á (1 ¨C 0.95)]/.052}, which is 76 infected animals. If one wishes to increase the diagnostic sensitivity to 99% ¡À 2%, then the theoretical number of animals required is only 99.

These estimates of sample size may be misleading because they assume that the reference animals represent the same and a normal frequency distribution in the total population. This is unlikely since the latter population is influenced by many unquantifiable biological and environmental variables. Factors such as breed, age, sex, stage of infection, differing responses of individuals to infectious agents, differing host responses in chronic versus peracute infections, and the effect of diet and environment are but a few examples. All may have an impact on antibody production. Additionally, antibody to closely related infectious agents may cause crossreactions in the assay, and if this combination of agents is found only in one portion of the total population targeted by the assay, but is not represented in the panel of reference sera, then obviously the estimates of D-SN and D-SP derived from the reference panel will be

Page 308

wrong. It is therefore impossible to represent fully all variables found in a target population of, say, 25 million animals using a sample of 100 animals.

The way to reduce the error in any statistical estimate is to increase the sample size (the larger the sample size, the more confident one can be in the estimate of the population). The experience of people validating assays indicates that is necessary to evaluate sera from several hundred known infected animals to account for many of the variables in a large population. Since the number of variables is indeterminate, we would recommend that at least 500¨C1000 samples be selected randomly from throughout the target population in which the assay will eventually be used. Such an exercise serves to define a population and may be further refined if distinct environmental regions can be regarded as having similarly influenced animals. In this way defined populations can be compared as to their distribution statistics. The extension of assays through active use in the evaluation of different populations and comparative testing against other methods also serves to allow a reestimation of the sensitivity and specificity of ELISAs.

The calculated number of uninfected reference animals required to establish diagnostic specificity of the assay is even greater. To validate an antibody detection test that will be 99% specific (only one false positive per 100 uninfected animals), an extremely large population of uninfected animals must be tested, representing as many biological and environmental variables as possible. This will allow an estimate of confidence in the test specificity. Again, the assumptions in the statistical calculations are a major concern; therefore, one should think in terms of at least 1000, and preferably upward of 5000 samples from animals that are known to be uninfected and not vaccinated with the agent in question, to establish a reasonable estimate of specificity.

1.4.3¡ª

The Gold Standard for Classifying Animals as Uninfected

The term gold standard, refers to the method or composite of methods giving results that are regarded as unequivocally classifying animals as infected or not infected. The results obtained from the new method are compared to those obtained using the gold standard during the validation process. In statistical terms, the gold standard results are regarded as the independent variable whereas the result from the new assay is the dependent variable. The results of the new assay are deemed correct or incorrect relative to the gold standard.

Classifying a population of animals as unequivocally uninfected with the agent in question using culture or isolation techniques or serology is not possible. One cannot rule out the possibility of false negative results, but it is possible to combine several sources of information to determine the probability that reference animals have never experienced an infection with the agent.

Page 309

Accordingly, reference animals selected to represent the uninfected group in the assessment of assay specificity need to be selected as follows:

1.From geographical areas where the disease has not been endemic for the at least 3 yr.

2.From herds from those areas that have not had clinical signs of the disease during the past 3 yr, nor herds that have been vaccinated against the agent in question.

3.From herds that are closed to importation of animals from endemic areas and do not have infected neighboring herds.

4.From areas where there is no evidence of antibody to the agent in question based on repeated testing over the past 2¨C3 yr.

If all of these criteria are met, one can be reasonably certain that these animals have not experienced the agent in question. Sera from such animals could then be used as the reference sera for the uninfected reference animal group.

1.4.4¡ª

Gold Standard for Classifying Animals as Infected

Several standards have been described that can be used with varying success to characterize the animals that serve as a source of reference sera:

1.Verification of infection: an absolute gold standard. The only true gold standard for classifying an animal as infected is the isolation of infectious agents or unequivocal histopathological criteria. Sera from such animals then are used to establish analytical and diagnostic sensitivity of a new assay designed to detect antibody to that agent.

2.Comparative serology: a relative standard of comparison. It may be impractical, technically difficult, or impossible to obtain definitive proof of infection via culture or isolation techniques. In the absence of such a gold standard, less exacting methods must serve as the standard of comparison with the new assay. If the other tests have already established assay performance characteristics (e.g., the Rose Bengal screening test followed by the complement fixation confirmatory test for detection of antibody to Brucella bovis), their results taken together provide a useful composite-based standard by which the new assay may be compared.

When the new test is evaluated by comparison with another serological test or combination of tests, the estimates of D-SN and D-SP for the new test are called relative diagnostic sensitivity and relative diagnostic specificity. These standards of comparison, however, have their own established levels of false positivity and false negativity that are sources of error carried over into the new assay. Therefore, the relative D-SN and D-SP for the new test will be underestimated. It follows that the greater the amount of false positivity and false negativity in the test that is used as the standard of comparison, the more the new assay's performance characteristics will be undermined. In other

Page 310

words, care must be taken when the "new" test in fact shows a better diagnostic capability than those previously accepted.

1.4.5¡ª

Experimental Infection or Vaccination: An Adjunct Standard of Comparison

Another standard for assessment of antibody response is sera taken sequentially, over several months, from experimentally infected or vaccinated animals. The strength of this standard is that it measures the ability of the assay to detect early antibody production and to follow the kinetics of antibody production to the agent in question. This also can be relative to preintervention treatment through the taking of samples before treatment. If it is evident that animals become infected, shed organisms in low numbers, but have no detectable antibody during the first 2 to 3 mo using the new assay, the analytical sensitivity of the assay may be inadequate, and estimates of diagnostic sensitivity will be low. Alternatively, if antibody appears quickly after inoculation of the infectious agent, and earlier than in the conventional assays used as standards of comparison, the new assay may have greater analytical sensitivity (and associated increased diagnostic sensitivity) than the conventional assay.

The interpretation of experimentally derived infected/vaccinated antibody response must be done carefully. The particular strain of organism, route of exposure, and dose are just a few variables that may stimulate an antibody response be that is quantitatively and qualitatively atypical of natural infection in the target population. The same is true of vaccination. Therefore, it is essential that experimentally induced antibody responses are relevant to those occurring in natural outbreaks of disease caused by the same infectious agent, otherwise the estimates of relative D-SN and D-SP may be in error. Because of the difficulty of equivalence in responses of naturally infected and experimentally infected/vaccinated animals, the relative D-SN and D-SP data derived from such animals should be considered as an adjunct criterion and should not be used alone to determine a new assay's relative D-SN and D-SP.

1.5¡ª

Random Testing of Samples from a Population Endemic for Disease: No Standard of Comparison

Validation of assays can be made in the absence of a standard. The validation then relies on statistical tools such as cluster or mixture analysis. Assuming that a few sera of known status are available to establish the feasibility of the assay system, it is possible to obtain a rough estimate of the assay's performance characteristics. Then, several thousands of animals in the target population can be tested in the absence of known infection status data other than possibly scattered clinical observations. If a clear bimodal frequency distribution becomes evident with a large peak consisting of many animals at the low

Page 311

end of the antibody scale, and a second peak extended over a wide range of higher antibody responses, it may be possible to estimate a cutoff in antibody response that separates presumed uninfected from presumed infected animals. Since in this scheme there is no proof of the infection status of the animals, this approach should be done as a last resort with later confirmation after definitive standard(s) of comparison become available. This process also is inherent in the cumulative analysis of data from the field on continuous use of a kit.

1.6¡ª

Repeatability and Reproducibility: Calculations

During feasibility studies, preliminary estimates of repeatability should be obtained. Selected sera from a bank of reference sera used to determine the assay's D-SN and D-SP can be tested using a series of runs of the assay within the same laboratory. It is useful to have several operators of the assay system do this exercise independently. This will provide an indication of assay repeatability that addresses the robustness of the assay.

Similarly, reproducibility of the assay (agreement among results of samples tested in different laboratories) needs to be established by testing the same sera in several other laboratories. The evaluation of both repeatability and reproducibility should be made on normalized data. For repeatability data, CVs for replicates should not exceed 10%, and regression analysis of normalized reproducibility data among laboratories generally should not give significant differences at the 95% confidence level.

1.7¡ª

Selection of the Positive/Negative Threshold (Cutoff)

After all the reference sera are tested, frequency distributions of results from infected and uninfected populations can be established. Both distributions are plotted on the same graph with the vertical axis representing the number of animals having test results that fall within each of 20 or so intervals of result values plotted on the horizontal axis.

For instance, when the data are expressed as a percentage of the value for the high-positive control sample (PP), 20 intervals of five units each (0¨C4%, 5¨C9%, 10¨C14%, and so on) could represent the horizontal axis. There is usually an overlap in these frequency distributions. The selection of a cutoff value for the new test will fall somewhere within this overlapping region.

The extent of the overlap may vary considerably from one assay to another. If only a small percentage (e.g., 2%) of the results from infected and uninfected animals are overlapping, and the cutoff selected is at the midpoint of the overlapping region, then the D-SN and the D-SP will both be 99%.

Alternatively, if the overlap involves a greater percentage of animals (e.g., 10%), then the cutoff chosen may be shifted to the left to minimize the false negative results (favoring greater D-SN), or to the right to minimize the false

Page 312

Fig. 1.

Hypothetical distribution density of ELISA results of populations of noninfected (left-hand curve) and infected (right-hand curve) individuals. The FN and FP area is referred to as ''gray area," and results falling here must

be regarded as suspect. The importance of retesting depends on how important the result is to classification of the test unit. The setting up of cutoff values depends on knowledge of such overlaps and the variability of the test used. FN, False negative; FP, false positive; TN, true negative; TP, true positive.

positive results (favoring greater D-SP), depending on the intended application of the assay. Once selected, the cutoff will determine the D-SN and D-SP, which, in turn, are the bases for calculating predictive values for positive and negative test results.

1.7.1¡ª Details

When giving further consideration to cutoff values one needs to recognize that there is invariability an overlap (as already stated) between populations containing negative and positive test results. Thus, the estimation of a perfect discriminatory cutoff is not possible. Figure 1 presents a hypothetical (but typical) overlap of positive and negative distributions of ELISA. The cutoff value is the point set on the test scale that determines whether the response is positive or negative. The observed overlap reduces confidence in such statements for certain samples. The importance of false negative or false positive results depends on the required levels of diagnostic sensitivity against specificity. As already indicated, setting a cutoff of two or three times the SD of the negative control group is accepted practice. This assumes that there is a normal

Соседние файлы в предмете Химия