Brereton Chemometrics
.pdf382 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CHEMOMETRICS |
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.059 |
0.220 |
0.193 |
0.113 |
0.088 |
0.045 |
0.069 |
0.078 |
0.071 |
0.069 |
0.061 |
0.058 |
0.048 |
0.046 |
0.051 |
0.043 |
0.036 |
0.058 |
0.049 |
0.195 |
0.030 |
1.207 |
0.079 |
0.420 |
3.553 |
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
− − |
− |
|
|
|
|
|
|
|
−0.642 |
0.128 |
0.114 |
0.102 |
0.090 |
0.086 |
0.074 |
0.084 |
0.075 |
0.079 |
0.076 |
0.065 |
0.048 |
0.051 |
0.045 |
0.044 |
0.013 |
0.031 |
0.073 −0.089 |
0.115 −0.048 |
0.631 |
0.305 |
3.118 |
|
|
||
|
|
−2.380 |
−0.005 |
−0.099 |
0.095 |
0.046 |
0.087 |
0.068 |
0.074 |
0.068 |
0.075 |
0.071 |
0.062 |
0.061 |
0.078 |
0.063 |
0.066 |
0.054 |
0.029 |
0.021 −0.217 |
0.245 −0.344 |
0.264 |
−0.116 |
0.395 |
|
|
||
|
|
1.181 |
−0.012 |
0.118 |
0.050 |
0.042 |
0.069 |
0.066 |
0.047 |
0.056 |
0.062 |
0.070 |
0.077 |
0.104 |
0.111 |
0.109 |
0.118 |
0.114 |
0.076 |
0.252 |
0.096 |
0.394 |
0.070 |
0.079 |
−0.181 |
2.053 |
|
|
6.1. |
|
1.694 |
0.498 |
−0.104 |
−0.044 |
0.056 |
0.027 |
0.041 |
0.030 |
0.045 |
0.043 |
0.071 |
0.101 |
0.134 |
0.143 |
0.158 |
0.157 |
0.200 |
0.172 |
0.168 |
0.750 |
−1.732 |
−0.708 |
0.373 |
0.090 |
−3.461 |
|
|
datainTable |
|
−1.911 |
0.045 |
−0.246 |
0.037 |
−0.001 |
0.039 |
0.046 |
0.032 |
0.046 |
0.057 |
0.081 |
0.110 |
0.145 |
0.160 |
0.184 |
0.191 |
0.205 |
0.211 |
0.275 |
0.135 |
−0.387 |
0.885 |
0.287 |
−0.009 |
−1.803 |
|
|
regionsinthe |
|
1.240 |
−0.136 |
0.212 |
0.143 |
0.064 |
0.060 |
0.047 |
0.061 |
0.057 |
0.064 |
0.086 |
0.112 |
0.134 |
0.154 |
0.163 |
0.169 |
0.163 |
0.175 |
0.393 |
0.297 |
0.094 |
1.323 |
−0.553 |
0.466 |
3.105 |
|
|
purityof |
|
0292. |
0260. |
0155. |
0014. |
0097. |
0066. |
0075. |
0097. |
0093. |
0097. |
0095. |
0108. |
0105. |
0114. |
0110. |
0108. |
0097. |
0113. |
−0018. |
0439. |
1151. |
1238. |
0217. |
0281. |
−3829. |
|
|
Table6.8 Derivativecalculationfordeterminingthe |
(a)Scalingtherowstoconstanttotal |
1 −4.066 2.561 3.269 −2.295 |
2 −0.176 0.000 0.183 −0.005 |
3 0.145 0.111 0.404 −0.004 |
4 0.157 0.117 0.136 0.080 |
5 0.070 0.130 0.175 0.143 |
6 0.101 0.126 0.150 0.143 |
7 0.084 0.139 0.164 0.126 |
8 0.093 0.135 0.146 0.123 |
9 0.087 0.123 0.152 0.126 |
10 0.081 0.120 0.124 0.127 |
11 0.060 0.102 0.115 0.113 |
12 0.046 0.075 0.081 0.104 |
13 0.034 0.042 0.064 0.080 |
14 0.010 0.025 0.039 0.068 |
15 0.008 0.020 0.024 0.065 |
16 0.008 −0.004 0.033 0.066 |
17 0.032 −0.003 0.028 0.061 |
18 0.004 −0.014 0.050 0.096 |
19 0.023 −0.091 −0.108 −0.038 |
20 0.008 −0.027 −0.230 −0.358 |
21 1.664 −0.575 −0.659 0.719 |
22 −0.057 0.235 0.597 −0.985 |
23 −0.422 −0.045 0.191 0.057 |
24 −0.221 0.037 −0.094 0.023 |
25 −12.974 3.211 2.434 5.197 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
EVOLUTIONARY SIGNALS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
383 |
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.4049 |
0.0455 |
0.0316 |
0.0089 |
|
0.0000 |
0.0049 |
0.0027 |
0.0051 |
0.0058 |
0.0058 |
0.0031 |
0.0026 |
0.0026 |
0.0008 |
0.0010 |
0.0317 |
0.0005 |
0.2609 |
0.1658 |
0.0401 |
0.8792 |
|
overleaf) |
|
|
|
|
||||||||||||||||||||||
|
|
0.1439 |
0.0108 |
0.0095 |
0.0052 |
|
0.0032 |
0.0013 |
0.0002 |
0.0039 |
0.0070 |
0.0085 |
0.0075 |
0.0044 |
0.0076 |
0.0072 |
0.0043 |
0.0206 |
0.0085 |
0.0115 |
0.1157 |
0.1303 |
0.6359 |
|
(continued |
|
|
0.4953 |
0.0330 |
0.0326 |
0.0020 |
|
0.0031 |
0.0024 |
0.0006 |
0.0021 |
0.0026 |
0.0004 |
0.0001 |
0.0009 |
0.0025 |
0.0106 |
0.0122 |
0.0600 |
0.0135 |
0.0521 |
0.0360 |
0.0220 |
0.0527 |
|
|
|
|
0.2216 |
0.0087 |
0.0085 |
0.0017 |
|
0.0007 |
0.0024 |
0.0025 |
0.0075 |
0.0111 |
0.0131 |
0.0111 |
0.0087 |
0.0027 |
0.0064 |
0.0245 |
0.0095 |
0.0581 |
0.0130 |
0.0374 |
0.0871 |
0.3065 |
|
|
|
function |
0.3818 |
0.0783 |
0.0362 |
0.0133 |
|
0.0018 |
0.0036 |
0.0072 |
0.0169 |
0.0235 |
0.0263 |
0.0217 |
0.0135 |
0.0146 |
0.0100 |
0.0035 |
0.1153 |
0.3287 |
0.3660 |
0.1047 |
0.0785 |
0.2660 |
|
|
|
smoothing |
0.3814 |
0.0235 |
0.0586 |
0.0036 |
|
0.0085 |
0.0036 |
0.0096 |
0.0193 |
0.0253 |
0.0271 |
0.0256 |
0.0200 |
0.0150 |
0.0121 |
0.0202 |
0.0042 |
0.1260 |
0.0687 |
0.0773 |
0.0386 |
0.3725 |
|
|
|
quadratic |
0.2073 |
0.0244 |
0.0414 |
0.0180 |
|
0.0013 |
0.0020 |
0.0081 |
0.0131 |
0.0202 |
0.0228 |
0.0197 |
0.0143 |
0.0073 |
0.0042 |
0.0465 |
0.0485 |
0.0017 |
0.1998 |
0.0865 |
0.0309 |
0.5164 |
|
|
|
Savitsky–Golay |
0.0636 |
0.0448 |
0.0108 |
0.0144 |
|
0.0023 |
0.0081 |
0.0041 |
0.0025 |
0.0035 |
0.0043 |
0.0035 |
0.0004 |
0.0023 |
0.0015 |
0.0251 |
0.0548 |
0.2434 |
0.3419 |
0.1269 |
0.1250 |
1.0917 |
|
|
|
five-point |
0.4962 |
0.0444 |
0.0323 |
0.0070 |
|
0.0055 |
0.0032 |
0.0023 |
0.0053 |
0.0116 |
0.0151 |
0.0131 |
0.0090 |
0.0041 |
0.0050 |
0.0177 |
0.0947 |
0.0864 |
0.1404 |
0.0438 |
0.0098 |
0.9964 |
|
|
|
firstderivativeusinga |
0.4745 0.6236 |
0.0270 0.0296 |
0.0065 0.0466 |
0.0046 0.0009 |
|
0.0006 0.0049 |
0.0028 0.0064 |
0.0090 0.0120 |
0.0142 0.0168 |
0.0207 0.0221 |
0.0250 0.0222 |
0.0213 0.0224 |
0.0180 0.0134 |
0.0120 0.0077 |
0.0101 0.0025 |
0.0231 0.0247 |
0.0134 0.0661 |
0.1158 0.1653 |
0.0013 0.0543 |
0.0354 0.1423 |
0.0660 0.1121 |
0.7375 0.5495 |
|
|
|
Absolutevalueof |
0.8605 |
0.0480 |
0.0178 |
0.0114 |
|
0.0026 |
0.0038 |
0.0059 |
0.0119 |
0.0140 |
0.0167 |
0.0141 |
0.0103 |
0.0007 |
0.0011 |
0.0027 |
0.0008 |
0.3269 |
0.1518 |
0.0956 |
0.2543 |
2.9440 |
|
|
|
(b) |
1 2 3 4 |
5 |
6 |
7 |
8 9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 25 |
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
384 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CHEMOMETRICS |
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.079 |
0.055 |
0.015 |
0.000 |
0.008 |
0.005 |
0.009 |
0.010 |
0.010 |
0.005 |
0.005 |
0.005 |
0.001 |
0.002 |
0.055 |
0.001 |
0.450 |
0.286 |
|
|
|
|
|
|
|
|
||||||||||||||||||
|
|
0.046 |
0.040 |
0.022 |
0.013 |
0.005 |
0.001 |
0.016 |
0.029 |
0.036 |
0.032 |
0.019 |
0.032 |
0.030 |
0.018 |
0.087 |
0.036 |
0.049 |
0.489 |
|
|
|
|
|
0.124 |
0.122 |
0.007 |
0.012 |
0.009 |
0.002 |
0.008 |
0.010 |
0.001 |
0.001 |
0.003 |
0.009 |
0.040 |
0.046 |
0.225 |
0.051 |
0.195 |
0.135 |
|
|
|
|
|
0.038 |
0.037 |
0.008 |
0.003 |
0.010 |
0.011 |
0.033 |
0.049 |
0.058 |
0.049 |
0.038 |
0.012 |
0.028 |
0.108 |
0.042 |
0.256 |
0.057 |
0.164 |
|
|
|
|
|
0.066 |
0.031 |
0.011 |
0.002 |
0.003 |
0.006 |
0.014 |
0.020 |
0.022 |
0.018 |
0.011 |
0.012 |
0.008 |
0.003 |
0.097 |
0.277 |
0.309 |
0.088 |
|
|
|
|
scale |
0.043 |
0.107 |
0.007 |
0.016 |
0.007 |
0.018 |
0.035 |
0.046 |
0.049 |
0.047 |
0.037 |
0.027 |
0.022 |
0.037 |
0.008 |
0.230 |
0.125 |
0.141 |
|
|
|
|
common |
0.042 |
0.071 |
0.031 |
0.002 |
0.003 |
0.014 |
0.023 |
0.035 |
0.039 |
0.034 |
0.025 |
0.013 |
0.007 |
0.080 |
0.084 |
0.003 |
0.345 |
0.149 |
|
|
|
|
themeasurementsona |
0.082 0.050 |
0.060 0.012 |
0.013 0.016 |
0.010 0.003 |
0.006 0.009 |
0.004 0.005 |
0.010 0.003 |
0.021 0.004 |
0.028 0.005 |
0.024 0.004 |
0.017 0.000 |
0.008 0.003 |
0.009 0.002 |
0.033 0.028 |
0.175 0.061 |
0.160 0.272 |
0.260 0.382 |
0.081 0.142 |
|
|
|
|
andputting |
0.045 |
0.071 |
0.001 |
0.007 |
0.010 |
0.018 |
0.025 |
0.033 |
0.034 |
0.034 |
0.020 |
0.012 |
0.004 |
0.037 |
0.100 |
0.250 |
0.082 |
0.216 |
|
|
|
Table6.8 (continued) |
(c)Rejectingpoints3,22and23 |
1 2 3 4 0.065 0.075 |
5 0.024 0.018 |
6 0.015 0.013 |
7 0.003 0.002 |
8 0.005 0.008 |
9 0.008 0.025 |
10 0.016 0.039 |
11 0.019 0.057 |
12 0.023 0.069 |
13 0.019 0.059 |
14 0.014 0.050 |
15 0.001 0.033 |
16 0.001 0.028 |
17 0.004 0.064 |
18 0.001 0.037 |
19 0.444 0.321 |
20 0.206 0.004 |
21 0.130 0.098 |
22 23 24 25 |
|
|
|
|
|
|
|
|
|
|
|
386 |
|
|
|
|
|
|
|
|
|
CHEMOMETRICS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Datapoint |
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
5 |
10 |
15 |
20 |
25 |
0.1
di
0.01 |
15 |
7
0.001
Figure 6.29
Derivative purity plot for the data in Table 6.1 with the purest points indicated
regions of differing composition, but the visual display is often very informative and can cope well with unusual peakshapes.
6.4 Resolution
Resolution or deconvolution of two-way chromatograms or mixture spectra involves converting a cluster of peaks into its constituent parts, each ideally representing a component of the signal from a single compound. The number of named methods in the literature is enormous, and it would be completely outside the scope of this text to discuss each approach in detail. In areas such as chemical pattern recognition or calibration, certain generic approaches are accepted as part of an overall strategy and the data preprocessing, variable selection, etc., are regarded as extra steps. In the field of resolution of evolutionary data, there is a fondness for packaging a series of steps into a named method, so there are probably 20 or more named methods, and maybe as many unnamed approaches reported in the literature. However, most are based on a number of generic principles, which are described in this chapter.
There are several aims for resolution.
1.Obtaining the profiles for each resolved compound. These might be the elution profiles (in chromatography) or the concentration distribution in a series of compounds (in spectroscopy of mixtures) or the pH profiles of different chemical species.
388 |
CHEMOMETRICS |
|
|
points 4 and 8, and the slowest eluting B between points 15 and 19. Hence we could divide up the chromatogram as follows.
1.points 1–3: no compounds elute;
2.points 4–8: compound A elutes selectively;
3.points 9–14: co-elution;
4.points 15–19: compound B elutes selectively;
5.points 20–25: no compounds elute.
As discussed above, there can be slight variations on this theme. This is represented in Figure 6.30. Chemometrics is used to fill in the remaining pieces of the jigsaw. The only unknowns are the elution profiles in the composition 2 regions. The profiles in the composition 1 regions can be estimated either by using the summed profiles or by performing PCA in these regions and taking the scores of the first PC.
An alternative is to find pure variables rather than composition 1 regions. These methods are popular when using various types of spectroscopy such as in LC–MS or in the MIR of mixtures. Wavelengths, frequencies or masses belonging to single compounds can often be identified. In the case of Table 6.3, we suspect that variables C and F are diagnostic of the two compounds (see Figure 6.16), and their profiles are presented in Figure 6.31. Note that these profiles are somewhat noisy. This is fairly common in techniques such as mass spectrometry. It is possible to improve the quality of the profiles by using methods for smoothing as described in Chapter 3, or to average profiles from several pure variables. The latter technique is useful in NMR or IR spectroscopy where a peak might be defined by several datapoints, or where there could be a number of selective regions in the spectrum.
The result of this section will be to produce either a first guess of all or part of the
concentration profiles, represented by the matrix |
ˆ |
or of the spectra |
ˆ . |
|
C |
|
S |
6.4.1.2 Multiple Linear Regression
If pure profiles can be obtained from all components, the next step in deconvolution is straightforward.
In the case of Table 6.1, we can guess the pure spectra for A as the average of the data between times 4 and 8, and for B as the average between times 15 and 19. These
Compound A
Compound B
0 |
1 |
2 |
1 |
0 |
Composition
Figure 6.30
Composition of regions in chromatogram deriving from Table 6.1
EVOLUTIONARY SIGNALS |
|
|
|
389 |
|
|
8 |
|
|
|
|
|
7 |
|
|
F |
|
|
6 |
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
C |
|
4 |
|
|
|
|
Intensity |
3 |
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
1 |
|
|
|
|
|
0 |
|
|
|
|
|
1 |
6 |
11 |
16 |
21 |
|
−1 |
|
|
Datapoint |
|
|
−2 |
|
|
|
|
Figure 6.31
Profiles of variables C and F in Table 6.3
make up a 2 × 12 data matrix |
ˆ |
. Since |
|
S |
|
≈ ˆ ˆ
X C.S
therefore
ˆ |
= |
ˆ |
|
ˆ ˆ |
)−1 |
C |
|
X.S |
.(S.S |
as discussed in Chapter 5 (Section 5.3). The estimated spectra are listed in Table 6.9 and the resultant profiles are presented in Figure 6.32. Note that the vertical scale in fact has no direct physical meaning: intensity data can only be reconstructed by multiplying the profiles by the spectra. However, MLR has provided a very satisfactory estimate, and provided that pure regions are available for each significant component, is probably entirely adequate as a tool in many cases.
If pure variables such as spectral frequencies or m/z values can be determined, even if there are embedded peaks, it is also possible to use these to obtain first estimates of
Table 6.9 Estimated spectra obtained from the composition 1 regions in the example of Table 6.1.
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
|
|
|
|
|
|
|
|
|
|
|
|
0.519 |
0.746 |
0.862 |
0.713 |
0.454 |
0.341 |
0.194 |
0.176 |
0.312 |
0.410 |
0.465 |
0.404 |
0.041 |
0.006 |
0.087 |
0.221 |
0.356 |
0.603 |
0.676 |
0.575 |
0.395 |
0.199 |
0.136 |
0.162 |
|
|
|
|
|
|
|
|
|
|
|
|
EVOLUTIONARY SIGNALS |
391 |
|
|
hence
ˆ =
C T .R
and
ˆ |
= |
R−1.P |
S |
|
If we perform PCA on the dataset, and know the pure spectra, it is possible to find the matrix R−1 simply by regression since
R−1 |
= ˆ |
|
S.P |
[because the loadings are orthonormal (Chapter 4, Section 4.3.2) this equation is sim-
ˆ
ple]. It is then easy to obtain C. This procedure is illustrated in Table 6.10 using the spectra as obtained from Table 6.9. The profiles are very similar to those presented in Figure 6.32 and so are not presented graphically for brevity.
PCR can be employed in more elaborate ways using the known profiles in the composition 1 (and sometimes composition 0) region for each compound. These methods were the basis of some of the earliest approaches to resolution of two-way chromatographic data. There are several variants, and one is as follows.
1.Choose only those regions where one component elutes. In our example in Table 6.1, we will use the regions between times 4–8 and 15–19 inclusive, which involves 10 points.
2.For each compound, use either the estimated profiles if the region is composition 1 or 0 if another compound elutes in this region. A matrix is obtained of size Z × 2
whose columns correspond to each component, where Z equals the total number of composition 1 datapoints. In our example, the matrix is of size 10 × 2, half of the values being 0 and half consisting of the profile in the composition 1 region. Call this matrix Z.
3.Perform PCA on the overall matrix.
4.Find a matrix R such that Z ≈ T .R using the known profiles obtained in step 2,
simply by using regression so that R = (T .T )−1.T .Z but including the scores only of the composition 1 region.
5.Knowing R, it is a simple matter to reconstruct the concentration profiles by including the scores over the entire data matrix as above, and similarly the spectra.
The key steps in the calculation are presented in Table 6.11. Note that the magnitude of the numbers in the matrix R differ from those presented in Table 6.10. This is simply because the magnitudes of the estimates of the spectra and profiles are different, and have no physical significance. The resultant profiles obtained by the multiplication
ˆ =
C T .R on the entire dataset are illustrated in Figure 6.33.
In straightforward cases, PCR is unnecessary and if not carefully controlled may provide worse results than MLR. However, for more complex systems it can be very useful.