Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

392

CHEMOMETRICS

 

 

Table 6.10 Estimation of profiles using PCA for the data in Table 6.9.

Loadings

0.215

0.321 0.375

0.372

0.333

0.305

0.288

0.255

0.248 0.237 0.236 0.214

0.254 0.377 0.381 0.191

0.085

0.369

0.479

0.422

0.203 0.015 0.113 0.080

 

Matrix R1

 

 

 

 

 

 

 

 

 

1.667

0.573

 

 

 

 

 

 

 

 

0.982

0.781

 

 

 

 

 

 

 

 

 

 

 

Matrix R

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.419

0.307

 

 

 

 

 

 

 

 

 

 

0.527

0.894

 

 

T .R

 

 

 

 

 

 

 

 

Scores

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.019

0.007

 

0.012

0.001

 

 

 

0.079

0.040

 

0.012

0.060

 

 

 

0.077

0.076

0.072

0.044

 

 

0.380

0.147

0.237

0.015

 

 

0.746

0.284

0.462

0.024

 

 

1.464

0.493

0.873

0.009

 

 

 

2.412

0.795

1.429

0.031

 

 

 

3.332

1.147

2.000

0.001

 

 

3.775

1.066

2.143

0.208

 

 

 

3.646

0.770

1.933

0.432

 

 

 

3.286

0.116

1.438

0.906

 

 

 

2.954

0.593

 

0.925

1.438

 

 

 

2.650

1.210

 

0.473

1.896

 

 

 

2.442

1.504

 

0.231

2.095

 

 

 

2.020

1.461

 

0.077

1.927

 

 

 

1.432

1.098

 

0.022

1.421

 

 

 

0.803

0.682

 

0.022

0.856

 

 

 

0.495

0.377

 

0.009

0.489

 

 

 

0.158

0.288

 

0.086

0.306

 

 

 

0.037

0.112

 

0.043

0.111

 

 

 

0.013

0.046

 

0.029

0.037

 

 

 

0.026

0.039

 

0.010

0.043

 

 

 

0.026

0.009

 

0.006

0.016

 

 

 

0.057

0.041

 

0.003

0.054

 

 

 

 

0.011

0.012

0.011

0.007

 

 

6.4.2 Partial Selectivity

More difficult situations occur when only some components exhibit selectivity. A common example is a completely embedded peak in HPLC–DAD. In the case of LC–MS or LC–NMR, this problem is often solved by finding pure variables, but because UV/vis spectra are often completely overlapping it is not always possible to treat data in this manner.

Fortunately, PCA comes to the rescue. In Chapter 5 we discussed the different applicabilities of PCR and MLR. Using the former method, we stated that it was not necessary to have information about the concentration of every component in the mixture, simply a good idea of how many significant components there are. So in the case of resolution of two-way data these approaches can easily be extended. We

EVOLUTIONARY SIGNALS

393

 

 

Table 6.11 Key steps in the calculation of the rotation matrix for the data in Table 6.1 using scores in composition 1 regions.

Time

Z

 

 

 

T

 

 

 

 

 

 

 

Compound

Compound

PC 1

PC 2

 

A

B

 

 

 

 

 

 

 

 

1

 

 

 

 

 

2

 

 

 

 

 

3

 

 

 

 

0.147

4

1.341

0.000

 

0.380

5

2.462

0.000

 

0.746

0.284

6

4.910

0.000

 

1.464

0.493

7

8.059

0.000

 

2.412

0.795

8

11.202

0.000

 

3.332

1.147

9

 

 

 

 

 

10

 

 

 

 

 

11

 

 

 

 

 

12

 

 

 

 

 

13

 

 

 

 

 

14

 

 

 

 

 

15

0.000

7.104

 

2.020

1.461

16

0.000

5.031

 

1.432

1.098

17

0.000

2.838

 

0.803

0.682

18

0.000

1.696

 

0.495

0.377

19

0.000

0.625

 

0.158

0.288

20

 

 

 

 

 

21

22

23

24

25

Matrix R

2.311 1.094 3.050 3.197

will illustrate this using the data in Table 6.2 which correspond to three peaks, the middle one being completely embedded in the others. There are several different ways of exploiting this.

One approach uses the idea of a zero concentration window. The first step is to identify compounds that we know have selective regions, and determine where they do not elute. This information may be obtained from a variety of approaches such as eigenvalue or PC plots. In this region we expect the intensity of the data to be zero, so it is possible to find a vector r for each compound so that

0 = T0.r

where 0 is a vector of zeros, and T0 stands for the portion of the scores in this region; normally one excludes the region where no peaks elute, and then finds the zero component region for each component. For example, if we record a cluster over 50 datapoints and we suspect that there are three peaks eluting between points 10 and 25, 20 and 35 and 30 and 45, then there are three T0 matrices, and for the fastest eluting

394

CHEMOMETRICS

 

 

Intensity

14

12

10

8

 

 

 

 

 

6

 

 

 

 

 

4

 

 

 

 

 

2

 

 

 

 

 

0

 

 

 

 

 

0

5

10

15

20

25

−2

 

 

Datapoint

 

 

 

 

 

 

 

Figure 6.33

Profiles obtained as described in Section 6.4.1.3

component this is between points 26 and 45. Note that the number of PCs should be made equal to the number of compounds in the overall mixture. There is one small problem in that one value of the vector r must be set to an arbitrary number; usually the first coefficient is set to 1, but this does not have a serious effect on the algorithm. The equation can then be solved as follows. Separate out the contribution from the first PC to that from all the others so, setting r1 to 1,

T 0(2 : K).r 2 : K ≈ −t 01

so that

r2 : K = T 0(2 : K ).T 0(2 : K ) 1 .T 0(2 : K ).t 01

where r2 : K is a column vector of length K 1 where there are K PCs, t 01 is the scores of the first PC over the zero concentration window and T 0(2 : K) the scores of the remaining PCs. It is important to ensure that K equals the number of components suspected to elute within the cluster of peaks. It is possible to perform this operation on any embedded peaks because these also exhibit zero composition regions.

The profiles of all the compounds can now be obtained over the entire region by

 

ˆ

=

T .R

C

 

 

and the spectra by

=

 

 

ˆ

R1.P

S

 

 

We will illustrate this with the example of Table 6.2. From inspecting the data we might conclude that compound A elutes between times 4 and 13, B between times

EVOLUTIONARY SIGNALS

395

 

 

9 and 17 and C between times 13 and 22. This information could be obtained by a variety of methods, as discussion in Section 6.3. Hence the zero composition regions are as follows:

compound A: points 14–22;

compound B: points 4–8 and 18–22;

compound C: points 4–12.

Obviously different approaches may identify slightly different regions. The calculation is presented in Table 6.12 for the data in Table 6.2, and the resultant profiles and spectra are presented in Figure 6.34.

Table 6.12 Determing spectrum and elution profiles of an embedded peak.

(a) Choosing matrices T0: the composition 0 regions are used to identify the portions of the overall scores matrix for compounds A, B and C

Time

T

 

 

Composition 0 regions

 

 

 

 

 

 

 

 

 

 

 

A

B

C

 

 

 

 

 

 

 

1

0.011

0.006

0.052

 

 

 

2

0.049

0.036

0.035

 

 

 

3

0.059

0.002

0.084

 

 

 

4

0.120

0.099

0.033

 

0

0

5

0.439

0.018

0.129

 

0

0

6

1.029

0.205

0.476

 

0

0

7

2.025

0.379

0.808

 

0

0

8

2.962

0.348

1.133

 

0

0

9

3.505

0.351

1.287

 

 

0

10

3.501

0.088

1.108

 

 

0

11

3.213

0.704

0.353

 

 

0

12

2.774

1.417

0.224

 

 

0

13

2.683

1.451

0.646

 

 

 

14

2.710

1.091

0.885

0

 

 

15

2.735

0.178

0.918

0

 

 

16

2.923

0.718

0.950

0

 

 

17

2.742

1.265

0.816

0

 

 

18

2.359

1.256

0.761

0

0

 

19

1.578

0.995

0.495

0

0

 

20

0.768

0.493

0.231

0

0

 

21

0.428

0.195

0.065

0

0

 

22

0.156

0.066

0.016

0

0

 

23

0.031

0.049

0.005

 

 

 

24

0.110

0.007

0.095

 

 

 

25

0.052

0.057

0.074

 

 

 

(b) Determining a matrix R

 

 

 

 

 

 

 

 

 

 

 

A

B

C

 

 

 

 

 

 

 

 

 

 

 

1

1

1

 

 

 

 

0.078

2.603

2.476

 

 

 

 

3.076

1.560

3.333

 

 

 

 

(continued overleaf )

396 CHEMOMETRICS

Table 6.12 (continued)

(c) Determining the concentration profiles

using

ˆ

=

T .R

 

 

 

 

C

 

 

 

 

 

 

 

A

B

C

 

 

 

 

 

 

 

1

 

0.150

0.079

0.200

 

2

 

0.062

0.009

0.255

 

3

 

0.200

0.196

0.335

 

4

 

0.009

0.085

0.475

 

5

 

0.835

0.192

0.052

 

6

 

2.476

0.245

0.050

 

7

 

4.480

0.222

0.272

 

8

 

6.419

0.288

0.049

 

9

 

7.436

0.584

0.086

 

10

 

6.916

2.001

0.410

 

11

 

4.354

4.494

0.295

 

12

 

2.194

6.813

0.012

 

13

 

0.809

7.466

1.244

 

14

 

0.072

6.931

2.959

 

15

 

0.075

4.630

5.353

 

16

 

0.055

2.536

7.869

 

17

 

0.133

0.722

8.596

 

18

 

0.079

0.277

8.004

 

19

 

0.022

0.239

5.691

 

20

 

0.020

0.155

2.757

 

21

 

0.213

0.022

1.126

 

22

 

0.100

0.009

0.374

 

23

 

0.021

0.167

0.076

 

24

 

0.401

0.019

0.223

 

25

 

0.274

0.211

0.053

 

Many papers and theses have been written about this problem, and there are a large number of modifications to this approach, but in this text we illustrate using one of the best established approaches.

6.4.3 Incorporating Constraints

Finally, it is important to mention another class of methods. In many cases it is not possible to obtain a unique mathematical solution to the multivariate resolution of complex mixtures, and the problem of embedded peaks without selectivity, which may occur, for example, in impurity monitoring, causes difficulties when using many conventional approaches.

There is a huge literature on algorithm development under such circumstances, which cannot be fully reviewed in this book. However, many modern methods attempt to incorporate chemical knowledge or constraints about a system. For example, underlying chromatographic profiles should be unimodal, and spectra and chromatographic profiles positive, so the reconstructions of Figure 6.34, whilst providing a good starting point, suggest that there is still some way to go before the embedded peak is modelled correctly. In many cases there exist a large number of equally good statistical solutions that fit a dataset, but many are unrealistic in chemical terms. Most algorithms try to narrow down the possible solutions to those that obey constraints. Often this is done

EVOLUTIONARY SIGNALS

397

 

 

Intensity

10

 

 

 

 

9

 

 

 

 

8

 

 

 

 

7

 

 

 

 

6

 

 

 

 

5

 

 

 

 

4

 

 

 

 

3

 

 

 

 

2

 

 

 

 

1

 

 

 

 

0

 

 

 

 

1

6

11

16

21

−1

 

 

Datapoint

 

 

 

 

 

0.3

Intensity

0.25

0.2

0.15

0.1

0.05

0

Spectral frequency

Figure 6.34

Profiles and spectra of three peaks obtained as in Section 6.4.2

398

CHEMOMETRICS

 

 

in an iterative manner, improving the fit to the data at the same time as ensuring that the solutions are physically meaningful.

These approaches are fairly complex and the enthusiast should either develop their own methods or code in the approaches from source literature. There are very few public domain software packages in this area, although some have been specially commissioned for industry or instrument manufacturers. One of the difficulties is that different problems occur according to instrumental technique and, with rapid changes in technology, new types of measurement come into vogue. A good example is the movement away from HPLC–DAD towards LC–MS and LC–NMR. The majority of the chemometrics literature in the area of resolution of two-way chromatograms still involves HPLC–DAD, where spectra are often overlapping, so that there is often no selectivity in the spectral dimension. However, in many other types of coupled chromatography there are often some selective variables, but many new difficulties relating to preprocessing, variable selection and preparation of the data arise. For example, in LC–NMR, Fourier transformation, spectral smoothing, alignment, baseline correction and variable selection play an important role, but it is often easier to find selective variables compared with HPLC–DAD, so the effort is concentrated in other areas. Also in MS and NMR there will be different sorts of spectral information that can be exploited, so sophisticated knowledge can be incorporated into an algorithm. When developing methods for other applications such as infrared spectroscopy, reaction monitoring or equilibria studies, very specific and technique dependent knowledge must be introduced.

Problems

Problem 6.1 Determining of Purity Within a Two-component Cluster: Derivatives, Correlation Coefficients and PC plots

Section 6.2.2 Section 6.3.3 Section 6.3.5

The table on page 399 represents an HPLC–DAD chromatogram recorded at 27 wavelengths (the low digital resolution is used for illustrative purposes) and 30 points in time. The wavelengths in nanometres are presented at the top.

1.Calculate the 29 correlation coefficients between successive points in time, and plot a graph of these. Remove the first correlation coefficients and replot the graph. Comment on these graphs. How might you improve the graph still further?

2.Use first derivatives to look at purity as follows.

a.Sum the spectrum (to a total of 1) at each point in time.

b.At each wavelength and points 3–28 in time, calculate the absolute value of the

five point quadratic Savitsky–Golay derivative (see Chapter 3, Table 3.6). You should produce a matrix of size 26 × 27.

c.Average these over all wavelengths and plot this graph against time.

d.Improve this graph by using a logarithmic scale for the parameter calculated in step c.

Comment on what you observe.

3.Perform PCA on the raw uncentred data, retaining the first two PCs. Plot a graph of the scores of PC2 versus 1, labelling the points. What do you observe from this graph and how does it compare to the plots in questions 1 and 2?

EVOLUTIONARY SIGNALS

399

 

 

Problem 6.1

389.63

0.0062

0.0058

0.0057

0.0056

0.0050

0.0044

0.0047

0.0065

0.0070

0.0064

0.0061

0.0064

0.0069

0.0069

0.0064

0.0064

0.0067

0.0069

0.0068

382.43

0.0019

0.0033

0.0050

0.0050

0.0049

0.0064

0.0079

0.0086

0.0091

0.0108

0.0108

0.0100

0.0093

0.0089

0.0086

0.0062

0.0060

0.0072

0.0079

375.24

0.0026

0.0022

0.0003

0.0003

0.0020

0.0049

0.0062

0.0065

0.0062

0.0051

0.0040

0.0028

0.0025

0.0026

0.0028

0.0026

0.0027

0.0026

0.0028

368.05

0.0052

0.0053

0.0081

0.0096

0.0088

0.0084

0.0130

0.0194

0.0215

0.0202

0.0197

0.0191

0.0174

0.0148

0.0120

0.0093

0.0081

0.0075

0.0066

361.48

0.0045

0.0026

0.0006

0.0015

0.0038

0.0071

0.0106

0.0150

0.0182

0.0195

0.0192

0.0166

0.0120

0.0066

0.0041

0.0041

0.0044

0.0039

0.0033

 

− − −

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

355.22

0.0035

0.0044

0.0056

0.0077

0.0116

0.0177

0.0250

0.0327

0.0381

0.0404

0.0393

0.0352

0.0292

0.0238

0.0198

0.0158

0.0121

0.0091

0.0071

348.07

0.0042

0.0038

0.0050

0.0086

0.0136

0.0203

0.0285

0.0360

0.0414

0.0423

0.0392

0.0348

0.0294

0.0241

0.0188

0.0145

0.0117

0.0094

0.0083

340.92

0.0026

0.0040

0.0071

0.0121

0.0198

0.0303

0.0405

0.0494

0.0553

0.0579

0.0552

0.0481

0.0395

0.0317

0.0253

0.0201

0.0167

0.0135

0.0104

333.78

0.0021

0.0048

0.0077

0.0111

0.0168

0.0254

0.0351

0.0432

0.0478

0.0490

0.0472

0.0424

0.0360

0.0292

0.0237

0.0194

0.0160

0.0142

0.0126

326.64

0.0011

0.0009

0.0036

0.0077

0.0145

0.0242

0.0351

0.0443

0.0506

0.0528

0.0505

0.0441

0.0355

0.0278

0.0215

0.0163

0.0126

0.0099

0.0076

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

319.51

0.0008

0.0022

0.0048

0.0087

0.0134

0.0199

0.0281

0.0359

0.0404

0.0404

0.0377

0.0337

0.0299

0.0269

0.0242

0.0222

0.0212

0.0212

0.0206

312.38

0.0004

0.0008

0.0032

0.0068

0.0117

0.0184

0.0265

0.0334

0.0371

0.0379

0.0365

0.0325

0.0271

0.0225

0.0201

0.0188

0.0181

0.0177

0.0169

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

305.27

0.0028

0.0012

0.0011

0.0036

0.0068

0.0114

0.0173

0.0233

0.0275

0.0292

0.0278

0.0247

0.0214

0.0190

0.0178

0.0176

0.0175

0.0169

0.0165

 

− −

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

298.15

0.0015

0.0032

0.0063

0.0119

0.0209

0.0335

0.0477

0.0589

0.0660

0.0684

0.0656

0.0585

0.0492

0.0409

0.0349

0.0305

0.0272

0.0244

0.0219

291.05

0.0006

0.0052

0.0136

0.0277

0.0488

0.0756

0.1051

0.1315

0.1491

0.1540

0.1458

0.1295

0.1094

0.0899

0.0730

0.0603

0.0512

0.0440

0.0377

283.95

0.0092

0.0260

0.0605

0.1207

0.2120

0.3323

0.4654

0.5843

0.6613

0.6810

0.6451

0.5683

0.4732

0.3806

0.3032

0.2429

0.1964

0.1586

0.1261

276.86

0.0111

0.0258

0.0545

0.1046

0.1811

0.2813

0.3921

0.4917

0.5574

0.5748

0.5437

0.4789

0.4005

0.3249

0.2611

0.2104

0.1710

0.1389

0.1108

269.77

0.0004

0.0081

0.0234

0.0501

0.0922

0.1480

0.2103

0.2653

0.3015

0.3115

0.2956

0.2615

0.2207

0.1836

0.1550

0.1354

0.1227

0.1133

0.1032

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

262.70

0.0002

0.0070

0.0214

0.0463

0.0851

0.1371

0.1957

0.2485

0.2830

0.2919

0.2771

0.2469

0.2123

0.1837

0.1666

0.1612

0.1628

0.1650

0.1620

255.62

0.0020

0.0058

0.0209

0.0469

0.0871

0.1402

0.1984

0.2506

0.2850

0.2945

0.2797

0.2482

0.2114

0.1793

0.1577

0.1470

0.1430

0.1406

0.1354

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

248.56

0.0016

0.0088

0.0228

0.0468

0.0833

0.1313

0.1844

0.2323

0.2638

0.2724

0.2579

0.2277

0.1917

0.1589

0.1338

0.1164

0.1046

0.0954

0.0862

241.50

0.0026

0.0078

0.0183

0.0356

0.0611

0.0944

0.1316

0.1653

0.1872

0.1926

0.1822

0.1608

0.1356

0.1124

0.0944

0.0814

0.0718

0.0639

0.0561

234.45

0.0051

0.0093

0.0169

0.0294

0.0487

0.0744

0.1033

0.1291

0.1456

0.1498

0.1420

0.1261

0.1071

0.0893

0.0752

0.0647

0.0574

0.0519

0.0463

227.41

0.0073

0.0141

0.0273

0.0513

0.0883

0.1371

0.1910

0.2389

0.2697

0.2771

0.2622

0.2321

0.1961

0.1615

0.1326

0.1110

0.0950

0.0825

0.0710

220.38

0.0127

0.0197

0.0347

0.0620

0.1040

0.1589

0.2194

0.2742

0.3102

0.3194

0.3027

0.2688

0.2283

0.1903

0.1601

0.1382

0.1225

0.1105

0.0998

213.35

0.0153

0.0211

0.0327

0.0518

0.0812

0.1194

0.1617

0.1997

0.2243

0.2307

0.2193

0.1954

0.1672

0.1414

0.1213

0.1068

0.0963

0.0888

0.0815

206.33

0.0204

0.0258

0.0361

0.0522

0.0769

0.1098

0.1465

0.1795

0.1994

0.2030

0.1925

0.1724

0.1484

0.1253

0.1067

0.0931

0.0829

0.0750

0.0681

0.0069

0.0073

0.0073

0.0074

0.0069

0.0061

0.0055

0.0049

0.0054

0.0070

0.0088

0.0069

0.0041

0.0035

0.0037

0.0041

0.0044

0.0041

0.0029

0.0012

0.0006

0.0007

0.0034

0.0034

0.0033

0.0021

0.0018

0.0020

0.0023

0.0019

0.0001

0.0009

0.0003

 

 

 

 

 

 

 

 

 

− −

0.0072

0.0088

0.0097

0.0084

0.0087

0.0089

0.0078

0.0069

0.0079

0.0092

0.0076

0.0013

0.0004

0.0015

0.0024

0.0032

0.0044

0.0050

0.0057

0.0060

0.0051

0.0042

 

− − − − − − − − − −

0.0057

0.0051

0.0048

0.0050

0.0046

0.0033

0.0017

0.0013

0.0016

0.0018

0.0023

0.0075

0.0063

0.0049

0.0036

0.0033

0.0031

0.0026

0.0026

0.0027

0.0025

0.0018

0.0082

0.0059

0.0039

0.0023

0.0013

0.0008

0.0002

0.0004

0.0008

0.0012

0.0010

 

 

 

 

 

 

− − − − −

0.0116

0.0098

0.0079

0.0062

0.0049

0.0043

0.0033

0.0023

0.0020

0.0025

0.0038

0.0056

0.0037

0.0025

0.0017

0.0008

0.0005

0.0001

0.0000

0.0003

0.0009

0.0018

 

 

 

 

 

 

 

− − −

0.0190

0.0169

0.0148

0.0121

0.0087

0.0064

0.0052

0.0042

0.0026

0.0006

0.0004

0.0150

0.0119

0.0087

0.0061

0.0050

0.0048

0.0043

0.0033

0.0021

0.0008

0.0006

 

 

 

 

 

 

 

 

 

 

0.0161

0.0149

0.0126

0.0101

0.0075

0.0051

0.0030

0.0014

0.0005

0.0005

0.0015

 

 

 

 

 

 

 

 

 

− −

0.0192

0.0158

0.0122

0.0094

0.0074

0.0060

0.0045

0.0028

0.0010

0.0002

0.0004

0.0319

0.0263

0.0207

0.0154

0.0104

0.0068

0.0044

0.0030

0.0017

0.0006

0.0002

0.0977

0.0735

0.0540

0.0387

0.0273

0.0201

0.0160

0.0131

0.0105

0.0080

0.0067

0.0866

0.0666

0.0501

0.0372

0.0272

0.0204

0.0157

0.0119

0.0097

0.0085

0.0081

0.0904

0.0754

0.0601

0.0462

0.0345

0.0252

0.0176

0.0118

0.0077

0.0048

0.0020

0.1515

0.1345

0.1132

0.0908

0.0700

0.0531

0.0405

0.0312

0.0239

0.0172

0.0113

0.1247

0.1083

0.0887

0.0699

0.0546

0.0425

0.0324

0.0238

0.0167

0.0114

0.0073

0.0756

0.0636

0.0515

0.0400

0.0300

0.0227

0.0176

0.0136

0.0096

0.0061

0.0035

0.0479

0.0397

0.0319

0.0252

0.0198

0.0157

0.0124

0.0094

0.0067

0.0046

0.0033

0.0400

0.0334

0.0275

0.0223

0.0179

0.0145

0.0117

0.0093

0.0073

0.0059

0.0052

0.0599

0.0488

0.0387

0.0308

0.0247

0.0195

0.0149

0.0115

0.0095

0.0085

0.0077

0.0882

0.0750

0.0613

0.0488

0.0387

0.0312

0.0258

0.0219

0.0188

0.0164

0.0147

0.0729

0.0630

0.0531

0.0447

0.0375

0.0316

0.0272

0.0232

0.0199

0.0172

0.0152

0.0609

0.0529

0.0449

0.0384

0.0336

0.0301

0.0270

0.0245

0.0228

0.0215

0.0205

400

 

 

 

 

CHEMOMETRICS

 

Problem 6.2 Evolutionary and Window Factor Analysis in the Detection of an Embedded Peak

 

 

 

Section 6.3.4

Section 6.4.1.3 Section 6.4.2

The following small dataset represents an evolutionary process consisting of two peaks,

one embedded, recorded at 16 points in time and over six variables.

 

0.156

0.187

0.131

0.119

0.073

0.028

0.217

0.275

0.229

0.157

0.096

0.047

0.378

0.456

0.385

0.215

0.121

0.024

0.522

0.667

0.517

0.266

0.178

0.065

0.690

0.792

0.705

0.424

0.186

0.060

0.792

0.981

0.824

0.541

0.291

0.147

0.841

1.078

0.901

0.689

0.400

0.242

0.832

1.144

0.992

0.779

0.568

0.308

0.776

1.029

0.969

0.800

0.650

0.345

0.552

0.797

0.749

0.644

0.489

0.291

0.377

0.567

0.522

0.375

0.292

0.156

0.259

0.330

0.305

0.202

0.158

0.068

0.132

0.163

0.179

0.101

0.043

0.029

0.081

0.066

0.028

0.047

0.006

0.019

0.009

0.054

0.056

0.013

0.042

0.031

0.042

0.005

0.038

0.029

0.013

0.057

1.Perform EFA on the data (using uncentred PCA) as follows.

For forward EFA, perform PCA on the 3 × 6 matrix consisting of the first three spectra, and retain the three eigenvalues.

Then perform PCA on the 4 × 6 matrix consisting of the first four spectra, retaining the first three eigenvalues.

Continue this procedure, increasing the matrix by one row at a time, until a 14 × 3 matrix is obtained whose rows correspond to the ends of each window (from 3 to 16) and columns to the eigenvalues.

Repeat the same process but for backward EFA, the first matrix consisting of the

three last spectra (14–16) to give another 14 × 3 matrix.

If you are able to program in Matlab or VBA, it is easiest to automate this, but it is possible simply to use repeatedly the PCA add-in for each calculation.

2.Produce EFA plots, first converting the eigenvalues to a logarithmic scale, superimposing six graphs; always plot the eigenvalues against the extreme rather than middle or starting value of each window. Comment.

3.Perform WFA, again using an uncentred data matrix, with a window size of 3. To do this, simply perform PCA on spectra 1–3, and retain the first three eigenvalues. Repeat this for spectra 2–4, 3–5, and so on. Plot the logarithms of the first three eigenvalues against window centre and comment.

4.There are clearly two components in this mixture. Show how you could distinguish the situation of an embedded peak from that of two peaks with a central region of coelution, and demonstrate that we are dealing with an embedded peak in this situation.

5.From the EFA plot it is possible to identify the composition 1 regions for the main peak. What are these? Calculate the average spectrum over these regions, and use this as an estimate of the spectrum of the main component.

EVOLUTIONARY SIGNALS

401

 

 

Problem 6.3 Variable Selection and PC plots in LCMS

 

Section 6.2.2 Section 6.2.3.2

Section 6.2.4 Section 6.2.3.1

The table on page 402 represents the intensity of 49 masses in LC–MS of a peak cluster recorded at 25 points in time. The aim of this exercise is to look at variable selection and the influence on PC plots. The data have been transposed to fit on a page, with the first column representing the mass numbers. Some preprocessing has already been performed with the original masses reduced slightly and the ion current at each mass set to a minimum of 0. You will probably wish to transpose the matrix so that the columns represent different masses.

1.Plot the total ion current (using the masses listed) against time. This is done by summing the intensity over all masses at each point in time.

2.Perform PCA on the dataset, but standardise the intensities at each mass, and retain two PCs. Present the scores plot of PC2 versus PC1, labelling all the points in time, starting from 1 the lowest to 25 the highest. Produce a similar loadings plot, also labelling the points and comment on the correspondence between these graphs.

3.Repeat this but sum the intensities at each point in time to 1 prior to standardising and performing PCA and produce scores plot of PC2 versus PC1, and comment. Why might it be desirable to remove points 1–3 in time? Repeat the procedure, this time using only points 4–25 in time. Produce PC2 versus PC1 scores and loadings plots and comment.

4.A very simple approach to variable selection involves sorting according to standard deviation. Take the standard deviations of the 49 masses using the raw data, and list the 10 masses with highest standard deviations.

5.Perform PCA, standardised, again on the reduced 25 × 10 dataset consisting of the best 10 masses according to the criterion of question 4, and present the labelled scores and loadings plots. Comment. Can you assign m/z values to the components in the mixture?

Problem 6.4 Use of Derivatives, MLR and PCR in Signal Analysis

Section 6.3.5 Section 6.4.1.2 Section 6.4.1.3

The following data represent HPLC data recorded at 30 points in time and 10 wavelengths.

0.042

0.076

0.043

0.089

0.105

0.004

0.014

0.030

0.059

0.112

0.009

0.110

0.127

0.179

0.180

0.050

0.015

0.168

0.197

0.177

0.019

0.118

0.182

0.264

0.362

0.048

0.147

0.222

0.375

0.403

0.176

0.222

0.329

0.426

0.537

0.115

0.210

0.328

0.436

0.598

0.118

0.304

0.494

0.639

0.750

0.185

0.267

0.512

0.590

0.774

0.182

0.364

0.554

0.825

0.910

0.138

0.343

0.610

0.810

0.935

0.189

0.405

0.580

0.807

1.005

0.209

0.404

0.623

0.811

1.019

0.193

0.358

0.550

0.779

0.945

0.258

0.392

0.531

0.716

0.964

0.156

0.302

0.440

0.677

0.715

0.234

0.331

0.456

0.662

0.806

0.106

0.368

0.485

0.452

0.666

0.189

0.220

0.521

0.470

0.603

0.058

0.262

0.346

0.444

0.493

0.188

0.184

0.336

0.367

0.437

(continued on p. 403)

Соседние файлы в предмете Химия