Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

382

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.059

0.220

0.193

0.113

0.088

0.045

0.069

0.078

0.071

0.069

0.061

0.058

0.048

0.046

0.051

0.043

0.036

0.058

0.049

0.195

0.030

1.207

0.079

0.420

3.553

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

− −

 

 

 

 

 

 

0.642

0.128

0.114

0.102

0.090

0.086

0.074

0.084

0.075

0.079

0.076

0.065

0.048

0.051

0.045

0.044

0.013

0.031

0.073 0.089

0.115 0.048

0.631

0.305

3.118

 

 

 

 

2.380

0.005

0.099

0.095

0.046

0.087

0.068

0.074

0.068

0.075

0.071

0.062

0.061

0.078

0.063

0.066

0.054

0.029

0.021 0.217

0.245 0.344

0.264

0.116

0.395

 

 

 

 

1.181

0.012

0.118

0.050

0.042

0.069

0.066

0.047

0.056

0.062

0.070

0.077

0.104

0.111

0.109

0.118

0.114

0.076

0.252

0.096

0.394

0.070

0.079

0.181

2.053

 

 

6.1.

 

1.694

0.498

0.104

0.044

0.056

0.027

0.041

0.030

0.045

0.043

0.071

0.101

0.134

0.143

0.158

0.157

0.200

0.172

0.168

0.750

1.732

0.708

0.373

0.090

3.461

 

 

datainTable

 

1.911

0.045

0.246

0.037

0.001

0.039

0.046

0.032

0.046

0.057

0.081

0.110

0.145

0.160

0.184

0.191

0.205

0.211

0.275

0.135

0.387

0.885

0.287

0.009

1.803

 

 

regionsinthe

 

1.240

0.136

0.212

0.143

0.064

0.060

0.047

0.061

0.057

0.064

0.086

0.112

0.134

0.154

0.163

0.169

0.163

0.175

0.393

0.297

0.094

1.323

0.553

0.466

3.105

 

 

purityof

 

0292.

0260.

0155.

0014.

0097.

0066.

0075.

0097.

0093.

0097.

0095.

0108.

0105.

0114.

0110.

0108.

0097.

0113.

0018.

0439.

1151.

1238.

0217.

0281.

3829.

 

 

Table6.8 Derivativecalculationfordeterminingthe

(a)Scalingtherowstoconstanttotal

1 4.066 2.561 3.269 2.295

2 0.176 0.000 0.183 0.005

3 0.145 0.111 0.404 0.004

4 0.157 0.117 0.136 0.080

5 0.070 0.130 0.175 0.143

6 0.101 0.126 0.150 0.143

7 0.084 0.139 0.164 0.126

8 0.093 0.135 0.146 0.123

9 0.087 0.123 0.152 0.126

10 0.081 0.120 0.124 0.127

11 0.060 0.102 0.115 0.113

12 0.046 0.075 0.081 0.104

13 0.034 0.042 0.064 0.080

14 0.010 0.025 0.039 0.068

15 0.008 0.020 0.024 0.065

16 0.008 0.004 0.033 0.066

17 0.032 0.003 0.028 0.061

18 0.004 0.014 0.050 0.096

19 0.023 0.091 0.108 0.038

20 0.008 0.027 0.230 0.358

21 1.664 0.575 0.659 0.719

22 0.057 0.235 0.597 0.985

23 0.422 0.045 0.191 0.057

24 0.221 0.037 0.094 0.023

25 12.974 3.211 2.434 5.197

 

 

 

 

 

 

 

 

 

 

 

 

 

 

EVOLUTIONARY SIGNALS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

383

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.4049

0.0455

0.0316

0.0089

 

0.0000

0.0049

0.0027

0.0051

0.0058

0.0058

0.0031

0.0026

0.0026

0.0008

0.0010

0.0317

0.0005

0.2609

0.1658

0.0401

0.8792

 

overleaf)

 

 

 

 

 

 

0.1439

0.0108

0.0095

0.0052

 

0.0032

0.0013

0.0002

0.0039

0.0070

0.0085

0.0075

0.0044

0.0076

0.0072

0.0043

0.0206

0.0085

0.0115

0.1157

0.1303

0.6359

 

(continued

 

 

0.4953

0.0330

0.0326

0.0020

 

0.0031

0.0024

0.0006

0.0021

0.0026

0.0004

0.0001

0.0009

0.0025

0.0106

0.0122

0.0600

0.0135

0.0521

0.0360

0.0220

0.0527

 

 

 

 

0.2216

0.0087

0.0085

0.0017

 

0.0007

0.0024

0.0025

0.0075

0.0111

0.0131

0.0111

0.0087

0.0027

0.0064

0.0245

0.0095

0.0581

0.0130

0.0374

0.0871

0.3065

 

 

 

function

0.3818

0.0783

0.0362

0.0133

 

0.0018

0.0036

0.0072

0.0169

0.0235

0.0263

0.0217

0.0135

0.0146

0.0100

0.0035

0.1153

0.3287

0.3660

0.1047

0.0785

0.2660

 

 

 

smoothing

0.3814

0.0235

0.0586

0.0036

 

0.0085

0.0036

0.0096

0.0193

0.0253

0.0271

0.0256

0.0200

0.0150

0.0121

0.0202

0.0042

0.1260

0.0687

0.0773

0.0386

0.3725

 

 

 

quadratic

0.2073

0.0244

0.0414

0.0180

 

0.0013

0.0020

0.0081

0.0131

0.0202

0.0228

0.0197

0.0143

0.0073

0.0042

0.0465

0.0485

0.0017

0.1998

0.0865

0.0309

0.5164

 

 

 

Savitsky–Golay

0.0636

0.0448

0.0108

0.0144

 

0.0023

0.0081

0.0041

0.0025

0.0035

0.0043

0.0035

0.0004

0.0023

0.0015

0.0251

0.0548

0.2434

0.3419

0.1269

0.1250

1.0917

 

 

 

five-point

0.4962

0.0444

0.0323

0.0070

 

0.0055

0.0032

0.0023

0.0053

0.0116

0.0151

0.0131

0.0090

0.0041

0.0050

0.0177

0.0947

0.0864

0.1404

0.0438

0.0098

0.9964

 

 

 

firstderivativeusinga

0.4745 0.6236

0.0270 0.0296

0.0065 0.0466

0.0046 0.0009

 

0.0006 0.0049

0.0028 0.0064

0.0090 0.0120

0.0142 0.0168

0.0207 0.0221

0.0250 0.0222

0.0213 0.0224

0.0180 0.0134

0.0120 0.0077

0.0101 0.0025

0.0231 0.0247

0.0134 0.0661

0.1158 0.1653

0.0013 0.0543

0.0354 0.1423

0.0660 0.1121

0.7375 0.5495

 

 

 

Absolutevalueof

0.8605

0.0480

0.0178

0.0114

 

0.0026

0.0038

0.0059

0.0119

0.0140

0.0167

0.0141

0.0103

0.0007

0.0011

0.0027

0.0008

0.3269

0.1518

0.0956

0.2543

2.9440

 

 

 

(b)

1 2 3 4

5

6

7

8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24 25

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

384

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.079

0.055

0.015

0.000

0.008

0.005

0.009

0.010

0.010

0.005

0.005

0.005

0.001

0.002

0.055

0.001

0.450

0.286

 

 

 

 

 

 

 

 

 

 

0.046

0.040

0.022

0.013

0.005

0.001

0.016

0.029

0.036

0.032

0.019

0.032

0.030

0.018

0.087

0.036

0.049

0.489

 

 

 

 

 

0.124

0.122

0.007

0.012

0.009

0.002

0.008

0.010

0.001

0.001

0.003

0.009

0.040

0.046

0.225

0.051

0.195

0.135

 

 

 

 

 

0.038

0.037

0.008

0.003

0.010

0.011

0.033

0.049

0.058

0.049

0.038

0.012

0.028

0.108

0.042

0.256

0.057

0.164

 

 

 

 

 

0.066

0.031

0.011

0.002

0.003

0.006

0.014

0.020

0.022

0.018

0.011

0.012

0.008

0.003

0.097

0.277

0.309

0.088

 

 

 

 

scale

0.043

0.107

0.007

0.016

0.007

0.018

0.035

0.046

0.049

0.047

0.037

0.027

0.022

0.037

0.008

0.230

0.125

0.141

 

 

 

 

common

0.042

0.071

0.031

0.002

0.003

0.014

0.023

0.035

0.039

0.034

0.025

0.013

0.007

0.080

0.084

0.003

0.345

0.149

 

 

 

 

themeasurementsona

0.082 0.050

0.060 0.012

0.013 0.016

0.010 0.003

0.006 0.009

0.004 0.005

0.010 0.003

0.021 0.004

0.028 0.005

0.024 0.004

0.017 0.000

0.008 0.003

0.009 0.002

0.033 0.028

0.175 0.061

0.160 0.272

0.260 0.382

0.081 0.142

 

 

 

 

andputting

0.045

0.071

0.001

0.007

0.010

0.018

0.025

0.033

0.034

0.034

0.020

0.012

0.004

0.037

0.100

0.250

0.082

0.216

 

 

 

Table6.8 (continued)

(c)Rejectingpoints3,22and23

1 2 3 4 0.065 0.075

5 0.024 0.018

6 0.015 0.013

7 0.003 0.002

8 0.005 0.008

9 0.008 0.025

10 0.016 0.039

11 0.019 0.057

12 0.023 0.069

13 0.019 0.059

14 0.014 0.050

15 0.001 0.033

16 0.001 0.028

17 0.004 0.064

18 0.001 0.037

19 0.444 0.321

20 0.206 0.004

21 0.130 0.098

22 23 24 25

 

 

 

 

 

 

 

 

 

 

 

EVOLUTIONARY SIGNALS

385

 

 

(d) Calculating the final consensus derivative

i d

i

0.063

0.054

0.013

0.006

0.007

0.010

0.019

0.028

0.031

0.027

0.020

0.014

0.015

0.038

0.081

0.192

0.205

0.177

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

386

 

 

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Datapoint

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

5

10

15

20

25

0.1

di

0.01

15

7

0.001

Figure 6.29

Derivative purity plot for the data in Table 6.1 with the purest points indicated

regions of differing composition, but the visual display is often very informative and can cope well with unusual peakshapes.

6.4 Resolution

Resolution or deconvolution of two-way chromatograms or mixture spectra involves converting a cluster of peaks into its constituent parts, each ideally representing a component of the signal from a single compound. The number of named methods in the literature is enormous, and it would be completely outside the scope of this text to discuss each approach in detail. In areas such as chemical pattern recognition or calibration, certain generic approaches are accepted as part of an overall strategy and the data preprocessing, variable selection, etc., are regarded as extra steps. In the field of resolution of evolutionary data, there is a fondness for packaging a series of steps into a named method, so there are probably 20 or more named methods, and maybe as many unnamed approaches reported in the literature. However, most are based on a number of generic principles, which are described in this chapter.

There are several aims for resolution.

1.Obtaining the profiles for each resolved compound. These might be the elution profiles (in chromatography) or the concentration distribution in a series of compounds (in spectroscopy of mixtures) or the pH profiles of different chemical species.

EVOLUTIONARY SIGNALS

387

 

 

2.Obtaining the spectra of each pure compound. This allows identification or library searching. In some cases, this procedure merely uses the multivariate signals to improve on the quality of the individual spectra, which may be noisy, but in other cases, such as an embedded peak, genuinely difficult information can be gleaned. This is particularly useful in impurity monitoring.

3.Obtaining quantitative information. This involves using the resolved two-way data to provide concentrations (or relative concentrations when pure standards are not available).

4.Automation. Complex chromatograms may consist of 50 or more peaks, some of which will be noisy and overlapping. Speeding up procedures, for example, using rapid chromatography in a matter of minutes resulting in considerable overlap, rather than taking 30 min per chromatogram, also results in embedded peaks. Chemometrics can ideally pull out the constituents’ spectra and profiles.

The methods in this chapter differ from those in Chapter 5 in that pure standards are not required for the model.

Whereas some datasets can be very complicated, it is normal to divide the data into small regions where there are signals from only a few components. Even in the spectroscopy of mixtures, in many cases such as MIR or NMR it is normally easy to find regions of the spectra where only two or three compounds at the most absorb, so this process of finding windows rather than analysing an entire dataset in one go is normal. Hence we will limit the discussion to three peak clusters in this section. Naturally the methods in Section 6.3 would usually first be applied to the entire dataset to identify these regions. We will illustrate the discussion below primarily in the context of coupled chromatography.

6.4.1 Selectivity for All Components

These methods involve first finding some pure or selective (composition 1) region in the chromatogram or selective spectral measurement such as an m/z value for each compound in a mixture.

6.4.1.1 Pure Spectra and Selective Variables

The most straightforward situation is when each compound has a composition 1 region. The simplest approach is to estimate the pure spectrum in such a region. There are several methods.

1.Take the spectrum at the point of maximum purity for each compound.

2.Average the spectra for each compound over each composition 1 region.

3.Perform PCA over each composition 1 region separately (so if there are three compounds, perform three PCA calculations) and then take the loadings of the first PC as an estimate of the pure spectrum. PCA is used as a smoothing technique, the idea being that the noise is banished to later PCs.

Some rather elaborate multivariate methods are also available that, instead of using the spectra in the composition 1 regions, use the elution profiles. In the case of Table 6.1 we might guess that the fastest eluting compound A has a composition 1 region between

388

CHEMOMETRICS

 

 

points 4 and 8, and the slowest eluting B between points 15 and 19. Hence we could divide up the chromatogram as follows.

1.points 1–3: no compounds elute;

2.points 4–8: compound A elutes selectively;

3.points 9–14: co-elution;

4.points 15–19: compound B elutes selectively;

5.points 20–25: no compounds elute.

As discussed above, there can be slight variations on this theme. This is represented in Figure 6.30. Chemometrics is used to fill in the remaining pieces of the jigsaw. The only unknowns are the elution profiles in the composition 2 regions. The profiles in the composition 1 regions can be estimated either by using the summed profiles or by performing PCA in these regions and taking the scores of the first PC.

An alternative is to find pure variables rather than composition 1 regions. These methods are popular when using various types of spectroscopy such as in LC–MS or in the MIR of mixtures. Wavelengths, frequencies or masses belonging to single compounds can often be identified. In the case of Table 6.3, we suspect that variables C and F are diagnostic of the two compounds (see Figure 6.16), and their profiles are presented in Figure 6.31. Note that these profiles are somewhat noisy. This is fairly common in techniques such as mass spectrometry. It is possible to improve the quality of the profiles by using methods for smoothing as described in Chapter 3, or to average profiles from several pure variables. The latter technique is useful in NMR or IR spectroscopy where a peak might be defined by several datapoints, or where there could be a number of selective regions in the spectrum.

The result of this section will be to produce either a first guess of all or part of the

concentration profiles, represented by the matrix

ˆ

or of the spectra

ˆ .

 

C

 

S

6.4.1.2 Multiple Linear Regression

If pure profiles can be obtained from all components, the next step in deconvolution is straightforward.

In the case of Table 6.1, we can guess the pure spectra for A as the average of the data between times 4 and 8, and for B as the average between times 15 and 19. These

Compound A

Compound B

0

1

2

1

0

Composition

Figure 6.30

Composition of regions in chromatogram deriving from Table 6.1

EVOLUTIONARY SIGNALS

 

 

 

389

 

8

 

 

 

 

 

7

 

 

F

 

 

6

 

 

 

 

 

5

 

 

 

 

 

 

 

 

 

C

 

4

 

 

 

 

Intensity

3

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

1

 

 

 

 

 

0

 

 

 

 

 

1

6

11

16

21

 

−1

 

 

Datapoint

 

 

−2

 

 

 

 

Figure 6.31

Profiles of variables C and F in Table 6.3

make up a 2 × 12 data matrix

ˆ

. Since

 

S

 

ˆ ˆ

X C.S

therefore

ˆ

=

ˆ

 

ˆ ˆ

)1

C

 

X.S

.(S.S

as discussed in Chapter 5 (Section 5.3). The estimated spectra are listed in Table 6.9 and the resultant profiles are presented in Figure 6.32. Note that the vertical scale in fact has no direct physical meaning: intensity data can only be reconstructed by multiplying the profiles by the spectra. However, MLR has provided a very satisfactory estimate, and provided that pure regions are available for each significant component, is probably entirely adequate as a tool in many cases.

If pure variables such as spectral frequencies or m/z values can be determined, even if there are embedded peaks, it is also possible to use these to obtain first estimates of

Table 6.9 Estimated spectra obtained from the composition 1 regions in the example of Table 6.1.

A

B

C

D

E

F

G

H

I

J

K

L

 

 

 

 

 

 

 

 

 

 

 

 

0.519

0.746

0.862

0.713

0.454

0.341

0.194

0.176

0.312

0.410

0.465

0.404

0.041

0.006

0.087

0.221

0.356

0.603

0.676

0.575

0.395

0.199

0.136

0.162

 

 

 

 

 

 

 

 

 

 

 

 

390

CHEMOMETRICS

 

 

Intensity

2.5

2.0

1.5

1.0

0.5

0.0

0

5

10

15

20

25

Datapoint

−0.5

Figure 6.32

Reconstructed profiles for the data in Table 6.1 using MLR

ˆ

elution profiles, C, then the spectra can be obtain using all (or a great proportion of) the variables by

ˆ

=

ˆ

 

ˆ

ˆ

.X

S

 

(C

.C)1

.C

The concentration profile can be improved by increasing the variables; so, for example, the first guess might involve using one variable per compound, the next 20 significant variables and the final 100 or more. This approach is also useful in spectroscopy of mixtures, if pure frequencies can be identified for each compound. Using these for initial estimates of the concentrations of each compound in each spectrum, the full spectra can be reconstructed even when there are overlapping regions. Such approaches are useful in MIR, but not so valuable in NIR or UV/vis spectroscopy where it is often hard to find selective wavelengths and the effectiveness depends on the type of spectroscopy employed.

6.4.1.3 Principal Components Regression

PCR is an alternative to MLR (Section 5.4) and can be used in signal analysis just as in calibration. There are a number of ways of employing PCA, but a simple approach is to note that the scores and loadings can be related to the concentration profile and spectra by

X

ˆ ˆ

=

T .R.R1.P

C.S

 

EVOLUTIONARY SIGNALS

391

 

 

hence

ˆ =

C T .R

and

ˆ

=

R1.P

S

 

If we perform PCA on the dataset, and know the pure spectra, it is possible to find the matrix R1 simply by regression since

R1

= ˆ

 

S.P

[because the loadings are orthonormal (Chapter 4, Section 4.3.2) this equation is sim-

ˆ

ple]. It is then easy to obtain C. This procedure is illustrated in Table 6.10 using the spectra as obtained from Table 6.9. The profiles are very similar to those presented in Figure 6.32 and so are not presented graphically for brevity.

PCR can be employed in more elaborate ways using the known profiles in the composition 1 (and sometimes composition 0) region for each compound. These methods were the basis of some of the earliest approaches to resolution of two-way chromatographic data. There are several variants, and one is as follows.

1.Choose only those regions where one component elutes. In our example in Table 6.1, we will use the regions between times 4–8 and 15–19 inclusive, which involves 10 points.

2.For each compound, use either the estimated profiles if the region is composition 1 or 0 if another compound elutes in this region. A matrix is obtained of size Z × 2

whose columns correspond to each component, where Z equals the total number of composition 1 datapoints. In our example, the matrix is of size 10 × 2, half of the values being 0 and half consisting of the profile in the composition 1 region. Call this matrix Z.

3.Perform PCA on the overall matrix.

4.Find a matrix R such that Z T .R using the known profiles obtained in step 2,

simply by using regression so that R = (T .T )1.T .Z but including the scores only of the composition 1 region.

5.Knowing R, it is a simple matter to reconstruct the concentration profiles by including the scores over the entire data matrix as above, and similarly the spectra.

The key steps in the calculation are presented in Table 6.11. Note that the magnitude of the numbers in the matrix R differ from those presented in Table 6.10. This is simply because the magnitudes of the estimates of the spectra and profiles are different, and have no physical significance. The resultant profiles obtained by the multiplication

ˆ =

C T .R on the entire dataset are illustrated in Figure 6.33.

In straightforward cases, PCR is unnecessary and if not carefully controlled may provide worse results than MLR. However, for more complex systems it can be very useful.

Соседние файлы в предмете Химия