Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

332

 

 

 

CHEMOMETRICS

 

the 25 × 22 spectral calibration matrix and c a 25 × 1 vector. (ii) Determine the

ˆ

=

ˆ

ˆ ˆ

)1.

predicted concentration matrix C

 

X.S .(S.S

5.In the model of question 3, plot a graph of predicted against true concentration for compound A. Determine the root mean square error both in mM and as a percentage of the average for all three compounds. Comment.

6.Repeat questions 2 and 3, but instead of MLR use PLS1 (centred) for the prediction of the concentration of A retaining the first three PLS components. Note that to obtain a root mean square error it is best to divide by 21 rather than 25 if three components are retained. You are not asked to cross-validate the models. Why are

the predictions much better?

7. Use the 25 × 22 calibration set as a training set, obtain a PLS1 (centred) model for all three compounds retaining three components in each case, and centring the spectroscopic data. Use this model to predict the concentrations of compounds A–C in the 30 reaction spectra. Plot a graph of estimated concentrations of each compound against time.

Problem 5.7 PLS1 Algorithm

Section 5.5.1 Section A.2.2

The PLS1 algorithm is fairly simple and described in detail in Appendix A.2.2. However, it can be easily set up in Excel or programmed into Matlab in a few lines, and the aim of this problem is to set up the matrix based calculations for PLS.

The following is a description of the steps you are required to perform.

(a)Centre both the x and c blocks by subtracting the column means.

(b)Calculate the scores of the first PLS component by

h = X .c

and then

X.h t =

h2

(c) Calculate the x loadings of the first PLS component by

p = t .X t2

Note that the denominator is simply the sum of squares of the scores.

(d) Calculate the c loadings (a scalar in PLS1) by

q = c .t t2

(e)Calculate the contribution to the concentration estimate by t.q and the contribution to the x estimate by t.p

(f)Subtract the contributions in step (e) from the current c vector and X matrix, and

use these residuals for the calculation of the next PLS component by returning to step (b).

(g) To obtain the overall concentration estimate simply multiply T .q, where T is a scores matrix with A columns corresponding to the PLS components and q a

CALIBRATION

333

 

 

column vector of size A. Add back the mean value of the concentrations to produce real estimates.

The method will be illustrated by a small simulated dataset, consisting of four samples, five measurements and one c parameter which is exactly characterised by three PLS components.

X

 

 

 

 

c

 

 

 

 

 

 

10.1

6.6

8.9

8.2

3.8

0.5

12.6

6.3

7.1

10.9

5.3

0.2

11.3

6.7

10.0

9.3

2.9

0.5

15.1

8.7

7.8

12.9

9.3

0.3

1.Calculate the loadings and scores of the first three PLS components, laying out the calculations in full.

2.What are the residual sum of squares for the ‘x’ and ‘c’ blocks as each successive component is computed (hint: start from the centred data matrix and simply sum the squares of each block, repeat for the residuals)? What percentage of the overall variance is accounted for by each component?

3.How many components are needed to describe the data exactly? Why does this answer not say much about the underlying structure of the data?

4.Provide a table of true concentrations, and of predicted concentrations as one, two and three PLS components are calculated.

5.If only two PLS components are used, what is the root mean square error of prediction of concentrations over all four samples? Remember to divide by 1 and not 4 (why is this?).

Problem 5.8 Cross-validation in PLS

Section 5.5.1 Section 5.6.2 Section A.2.2

The following consists of 10 samples, whose spectra are recorded at six wavelengths. The concentration of a component in the samples is given by a c vector. This dataset has been simulated to give an exact fit for two components as an example of how cross-validation works.

Sample

Spectra

 

 

 

 

 

c

 

 

 

 

 

 

 

 

1

0.10

0.22

0.20

0.06

0.29

0.10

1

2

0.20

0.60

0.40

0.20

0.75

0.30

5

3

0.12

0.68

0.24

0.28

0.79

0.38

9

4

0.27

0.61

0.54

0.17

0.80

0.28

3

5

0.33

0.87

0.66

0.27

1.11

0.42

6

6

0.14

0.66

0.28

0.26

0.78

0.36

8

7

0.14

0.34

0.28

0.10

0.44

0.16

2

8

0.25

0.79

0.50

0.27

0.98

0.40

7

9

0.10

0.22

0.20

0.06

0.29

0.10

1

10

0.19

0.53

0.38

0.17

0.67

0.26

4

 

 

 

 

 

 

 

 

1.Select samples 1–9, and calculate their means. Mean centre both the x and c variables over these samples.

334

CHEMOMETRICS

 

 

2.Perform PLS1, calculate two components, on the first nine samples, centred as in question 1. Calculate t, p, h and the contribution to the c values for each PLS component (given by q.t), and verify that the samples can be exactly modelled using two PLS components (note that you will have to add on the mean of samples 1–9 to c after prediction). You should use the algorithm of Problem 5.7 or Appendix A.2.2 and you will need to find the vector h to answer question 3.

3.Cross-validation is to be performed on sample 10, using the model of samples 1–9 as follows.

(a)Subtract the means of samples 1–9 from sample 10 to produce a new x vector, and similarly for the c value.

(b)Then calculate the predicted score for the first PLS component and sample 10

by tˆ10,1 = x10.h1/ h21, where h1 has been calculated above on samples 1–9 for the first PLS component, and calculate the new residual spectral vector

x10 tˆ10,1.p1.

(c)Calculate the contribution to the mean centred concentration for sample 10 as

tˆ10.1.q1, where q1 is the value of q for the first PLS component using samples 1–9, and calculate the residual concentration c10 tˆ10,1q1.

(d)Find tˆ10,2 for the second component using the residual vectors above using the vector h determined for the second component using the prediction set of nine samples.

(e)Calculate the contribution to predicting c and x from the second component.

4.Demonstrate that, for this particular set, cross-validation results in an exact prediction of concentration for sample 10; remember to add the mean of samples 1–9 back after prediction.

5.Unlike for PCA, it is not possible to determine the predicted scores by x.p but it is necessary to use a vector h. Why is this?

Problem 5.9 Multivariate Calibration in Three-way Diode Array HPLC

The aim of this problem is to perform a variety of methods of calibration on a three-way dataset. Ten chromatograms are recorded of 3-hydroxypyridine impurity within a main peak of 2-hydroxypyridine. The aim is to employ PLS to determine the concentration of the minor component.

For each concentration a 20 × 10 chromatogram is presented, taken over 20 s in time (1 s digital resolution), and in this dataset, for simplicity, absorbances every 12 nm starting at 230 nm are presented.

Five concentrations are used, replicated twice. The 10 concentrations (mM) in the following table are presented in the arrangement on the following pages.

0.0158

0.0158

 

 

0.0315

0.0315

0.0473

0.0473

 

 

0.0631

0.0631

0.0789

0.0789

 

 

1.One approach to calibration is to use one-way PLS. This can be in either the spectroscopic or time direction. In fact, the spectroscopic dimension is often more

CALIBRATION

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

335

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

 

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.000

0.000

 

overleaf)

 

0.000

 

 

 

0.054 0.005

0.090 0.008

0.135 0.010

0.181 0.013

0.220 0.015

0.247 0.017

0.262 0.017

0.264 0.017

0.256 0.017

0.243 0.016

0.225 0.015

0.207 0.013

0.188 0.012

0.170 0.011

0.153 0.010

0.138 0.009

0.124 0.008

0.112 0.007

0.101 0.006

0.092 0.006

 

0.042 0.006

0.074 0.008

0.118 0.011

0.166 0.014

0.213 0.016

0.249 0.018

0.272 0.019

0.280 0.020

0.277 0.019

0.265 0.018

0.248 0.017

0.229 0.015

0.208 0.014

0.188 0.013

0.170 0.011

0.152 0.010

0.137 0.009

0.123 0.008

0.111 0.007

0.100 0.007

 

(continued

 

0.114

0.196

0.295

0.396

0.484

0.546

0.578

0.582

0.566

0.536

0.499

0.458

0.416

0.376

0.339

0.306

0.275

0.248

0.224

0.203

 

0.085

0.156

0.252

0.360

0.464

0.545

0.596

0.616

0.609

0.584

0.547

0.504

0.459

0.415

0.374

0.336

0.302

0.272

0.245

0.221

 

 

 

0.130

0.220

0.329

0.442

0.539

0.607

0.642

0.647

0.629

0.596

0.554

0.508

0.462

0.418

0.377

0.339

0.306

0.276

0.249

0.226

 

0.099

0.178

0.284

0.404

0.518

0.608

0.664

0.685

0.677

0.649

0.607

0.560

0.510

0.461

0.416

0.374

0.336

0.302

0.272

0.246

 

 

 

0.096

0.159

0.236

0.315

0.383

0.430

0.455

0.458

0.445

0.421

0.391

0.359

0.326

0.295

0.266

0.240

0.216

0.195

0.176

0.159

 

0.076

0.132

0.207

0.291

0.370

0.433

0.472

0.486

0.480

0.460

0.430

0.396

0.361

0.326

0.294

0.264

0.238

0.214

0.193

0.174

 

 

 

0.048

0.079

0.117

0.156

0.189

0.213

0.225

0.226

0.220

0.208

0.193

0.177

0.161

0.146

0.131

0.118

0.107

0.096

0.087

0.079

 

0.038

0.066

0.103

0.144

0.183

0.214

0.233

0.240

0.237

0.227

0.213

0.196

0.178

0.161

0.145

0.131

0.117

0.106

0.095

0.086

 

 

 

0.020

0.031

0.045

0.059

0.071

0.080

0.084

0.084

0.082

0.077

0.072

0.066

0.060

0.054

0.049

0.044

0.039

0.036

0.032

0.029

 

0.018

0.028

0.042

0.057

0.071

0.082

0.088

0.091

0.089

0.085

0.079

0.073

0.066

0.060

0.054

0.049

0.044

0.039

0.036

0.032

 

 

 

0.011

0.015

0.019

0.024

0.027

0.030

0.031

0.031

0.030

0.028

0.026

0.023

0.021

0.019

0.017

0.016

0.014

0.013

0.011

0.010

 

0.013

0.016

0.021

0.026

0.030

0.033

0.034

0.035

0.034

0.032

0.029

0.027

0.024

0.022

0.020

0.018

0.016

0.014

0.013

0.012

 

 

 

0.087

0.148

0.221

0.296

0.360

0.405

0.428

0.431

0.419

0.397

0.370

0.340

0.309

0.280

0.253

0.228

0.206

0.186

0.168

0.152

 

0.067

0.120

0.191

0.271

0.347

0.406

0.443

0.457

0.452

0.433

0.406

0.375

0.342

0.310

0.279

0.252

0.226

0.204

0.184

0.166

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.000

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

 

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

 

 

 

0.005

0.008

0.011

0.013

0.016

0.017

0.018

0.018

0.017

0.016

0.015

0.014

0.013

0.011

0.010

0.009

0.008

0.007

0.007

0.006

 

0.005

0.007

0.010

0.012

0.015

0.017

0.018

0.018

0.018

0.017

0.016

0.015

0.013

0.012

0.011

0.010

0.009

0.008

0.007

0.006

 

 

 

0.055

0.092

0.137

0.184

0.224

0.252

0.267

0.269

0.261

0.247

0.230

0.211

0.191

0.173

0.156

0.140

0.126

0.113

0.102

0.093

 

0.035

0.062

0.101

0.145

0.190

0.226

0.250

0.261

0.261

0.251

0.237

0.219

0.200

0.181

0.164

0.147

0.132

0.119

0.107

0.097

 

 

 

0.116

0.198

0.299

0.402

0.492

0.555

0.588

0.594

0.577

0.547

0.508

0.466

0.423

0.382

0.344

0.310

0.279

0.251

0.227

0.205

 

0.069

0.130

0.215

0.315

0.413

0.495

0.550

0.575

0.574

0.554

0.522

0.483

0.441

0.400

0.361

0.325

0.292

0.263

0.237

0.214

 

 

 

0.132

0.223

0.334

0.449

0.548

0.618

0.654

0.659

0.641

0.607

0.564

0.517

0.470

0.424

0.382

0.344

0.309

0.279

0.252

0.228

 

0.081

0.149

0.243

0.353

0.462

0.552

0.612

0.639

0.638

0.615

0.579

0.536

0.490

0.444

0.401

0.361

0.324

0.292

0.263

0.237

 

 

 

0.097

0.161

0.239

0.320

0.389

0.438

0.463

0.467

0.453

0.429

0.398

0.365

0.332

0.300

0.270

0.243

0.219

0.197

0.178

0.161

 

0.063

0.111

0.177

0.255

0.331

0.394

0.435

0.454

0.452

0.436

0.410

0.379

0.347

0.314

0.284

0.255

0.230

0.207

0.186

0.168

 

 

 

0.048

0.080

0.118

0.158

0.192

0.216

0.229

0.231

0.224

0.212

0.197

0.180

0.164

0.148

0.133

0.120

0.108

0.097

0.088

0.080

 

0.032

0.056

0.088

0.126

0.164

0.195

0.215

0.224

0.224

0.216

0.203

0.188

0.171

0.155

0.140

0.126

0.114

0.102

0.092

0.083

 

 

 

0.020

0.032

0.046

0.060

0.073

0.081

0.085

0.086

0.083

0.079

0.073

0.067

0.061

0.055

0.049

0.044

0.040

0.036

0.033

0.029

 

0.015

0.024

0.036

0.050

0.063

0.074

0.081

0.084

0.084

0.081

0.076

0.070

0.064

0.058

0.052

0.047

0.042

0.038

0.034

0.031

 

 

 

0.011

0.015

0.020

0.024

0.028

0.030

0.032

0.031

0.030

0.028

0.026

0.024

0.022

0.020

0.018

0.016

0.014

0.013

0.012

0.010

 

0.011

0.014

0.019

0.023

0.027

0.030

0.032

0.032

0.032

0.030

0.028

0.026

0.023

0.021

0.019

0.017

0.015

0.014

0.013

0.011

 

 

 

0.089

0.150

0.224

0.300

0.366

0.412

0.436

0.439

0.428

0.405

0.377

0.346

0.315

0.285

0.257

0.231

0.208

0.188

0.170

0.154

 

0.055

0.100

0.163

0.237

0.309

0.369

0.409

0.426

0.426

0.411

0.387

0.359

0.328

0.298

0.269

0.243

0.218

0.197

0.177

0.160

 

 

336

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.000

 

0.001

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

 

0.001

 

 

0.008

0.010

0.013

0.015

0.018

0.019

0.020

0.020

0.019

0.018

0.016

0.015

0.014

0.012

0.011

0.010

0.009

0.008

0.007

0.006

 

0.009

0.011

0.014

0.016

0.018

0.020

0.021

0.021

0.020

0.019

0.017

0.016

0.014

 

0.051

0.086

0.130

0.177

0.220

0.252

0.271

0.275

0.270

0.257

0.239

0.220

0.200

0.180

0.162

0.146

0.131

0.118

0.106

0.096

 

0.043

0.075

0.117

0.164

0.210

0.245

0.267

0.275

0.271

0.259

0.242

0.223

0.203

 

0.102

0.178

0.276

0.383

0.479

0.552

0.593

0.605

0.594

0.565

0.527

0.484

0.440

0.398

0.358

0.322

0.289

0.260

0.234

0.211

 

0.082

0.151

0.244

0.351

0.453

0.533

0.583

0.601

0.594

0.569

0.532

0.489

0.445

 

0.119

0.204

0.313

0.430

0.536

0.616

0.662

0.674

0.661

0.629

0.586

0.538

0.490

0.442

0.398

0.358

0.321

0.289

0.260

0.235

 

0.099

0.175

0.278

0.396

0.508

0.596

0.651

0.670

0.662

0.633

0.592

0.544

0.495

 

0.093

0.153

0.229

0.311

0.385

0.440

0.471

0.480

0.469

0.446

0.416

0.382

0.347

0.314

0.282

0.254

0.228

0.205

0.185

0.167

 

0.081

0.135

0.208

0.290

0.368

0.428

0.465

0.478

0.471

0.450

0.421

0.387

0.352

 

0.047

0.077

0.114

0.155

0.191

0.218

0.233

0.237

0.232

0.221

0.206

0.189

0.172

0.155

0.140

0.126

0.113

0.102

0.092

0.083

 

0.042

0.068

0.104

0.144

0.183

0.213

0.231

0.237

0.233

0.223

0.208

0.191

0.174

 

0.023

0.033

0.047

0.061

0.074

0.084

0.089

0.090

0.088

0.083

0.077

0.071

0.064

0.058

0.052

0.047

0.042

0.038

0.035

0.031

 

0.022

0.032

0.045

0.059

0.073

0.083

0.089

0.091

0.089

0.085

0.079

0.072

0.066

 

0.017

0.021

0.025

0.029

0.033

0.035

0.036

0.035

0.034

0.032

0.029

0.027

0.024

0.022

0.020

0.018

0.016

0.014

0.013

0.012

 

0.019

0.023

0.027

0.031

0.034

0.037

0.038

0.037

0.036

0.033

0.031

0.028

0.025

 

0.080

0.137

0.210

0.288

0.358

0.411

0.441

0.449

0.440

0.419

0.391

0.360

0.328

0.297

0.267

0.240

0.216

0.195

0.175

0.159

 

0.067

0.118

0.188

0.266

0.341

0.399

0.435

0.448

0.442

0.423

0.396

0.365

0.332

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

0.001

0.001

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

 

0.001

0.001

0.002

0.002

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

 

0.008

0.010

0.013

0.016

0.018

0.020

0.020

0.020

0.020

0.018

0.017

0.016

0.014

0.013

0.011

0.010

0.009

0.008

0.007

0.007

 

0.008

0.010

0.013

0.015

0.018

0.019

0.020

0.020

0.020

0.019

0.017

0.016

0.015

 

0.047

0.081

0.126

0.175

0.220

0.254

0.275

0.281

0.276

0.263

0.245

0.225

0.205

0.185

0.166

0.149

0.134

0.120

0.108

0.098

 

0.040

0.068

0.108

0.154

0.200

0.237

0.261

0.272

0.271

0.261

0.245

0.226

0.206

 

0.093

0.168

0.267

0.376

0.477

0.555

0.601

0.616

0.606

0.578

0.539

0.495

0.450

0.407

0.366

0.329

0.295

0.265

0.239

0.215

 

0.074

0.137

0.225

0.328

0.430

0.515

0.570

0.595

0.594

0.572

0.538

0.497

0.453

 

0.110

0.193

0.302

0.422

0.534

0.619

0.670

0.686

0.674

0.642

0.599

0.550

0.500

0.452

0.407

0.365

0.328

0.295

0.265

0.240

 

0.090

0.161

0.258

0.371

0.484

0.576

0.637

0.664

0.661

0.637

0.599

0.553

0.504

 

0.086

0.145

0.222

0.306

0.384

0.443

0.478

0.488

0.479

0.456

0.425

0.390

0.355

0.320

0.288

0.259

0.233

0.209

0.188

0.170

 

0.075

0.125

0.193

0.273

0.351

0.415

0.456

0.474

0.471

0.453

0.426

0.393

0.358

 

0.044

0.073

0.111

0.152

0.190

0.220

0.237

0.242

0.237

0.226

0.210

0.193

0.175

0.158

0.143

0.128

0.115

0.104

0.093

0.085

 

0.038

0.063

0.097

0.136

0.174

0.206

0.226

0.235

0.233

0.225

0.211

0.194

0.177

 

0.021

0.032

0.046

0.061

0.074

0.085

0.090

0.092

0.090

0.085

0.079

0.072

0.066

0.059

0.053

0.048

0.043

0.039

0.035

0.032

 

0.021

0.030

0.042

0.056

0.070

0.081

0.087

0.090

0.089

0.085

0.080

0.073

0.067

 

0.016

0.020

0.025

0.029

0.033

0.036

0.037

0.036

0.035

0.033

0.030

0.027

0.025

0.022

0.020

0.018

0.016

0.015

0.013

0.012

 

0.018

0.022

0.026

0.030

0.034

0.036

0.037

0.037

0.036

0.034

0.031

0.029

0.026

 

0.074

0.130

0.203

0.283

0.358

0.414

0.447

0.457

0.450

0.429

0.401

0.369

0.336

0.304

0.273

0.246

0.221

0.199

0.179

0.162

 

0.061

0.108

0.173

0.249

0.324

0.385

0.425

0.443

0.441

0.425

0.400

0.370

0.338

CALIBRATION

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

337

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

0.001

0.001

0.001

0.000

0.000

0.000

 

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

 

0.002

 

0.013

0.012

0.010

0.009

0.008

0.007

0.007

0.011

0.013

0.016

0.018

0.020

0.022

0.022

0.022

0.021

0.019

0.018

0.016

0.015

0.013

0.012

0.010

0.009

0.008

0.007

0.007

 

0.183

0.165

0.148

0.133

0.119

0.107

0.097

0.053

0.087

0.131

0.180

0.224

0.258

0.277

0.282

0.277

0.263

0.245

0.224

0.204

0.184

0.165

0.148

0.133

0.119

0.107

0.096

 

0.403

0.362

0.325

0.292

0.262

0.236

0.213

0.100

0.175

0.273

0.382

0.482

0.558

0.603

0.617

0.605

0.576

0.537

0.493

0.447

0.403

0.362

0.325

0.291

0.261

0.235

0.212

 

0.448

0.403

0.362

0.325

0.292

0.263

0.237

0.120

0.204

0.313

0.432

0.542

0.626

0.674

0.688

0.675

0.642

0.598

0.548

0.498

0.449

0.403

0.362

0.324

0.291

0.262

0.236

 

0.318

0.286

0.257

0.231

0.207

0.187

0.169

0.098

0.158

0.234

0.317

0.393

0.451

0.483

0.492

0.482

0.458

0.426

0.390

0.354

0.319

0.287

0.257

0.231

0.208

0.187

0.169

 

0.157

0.142

0.127

0.114

0.103

0.093

0.084

0.051

0.080

0.118

0.158

0.196

0.224

0.240

0.244

0.239

0.227

0.211

0.193

0.175

0.158

0.142

0.128

0.115

0.103

0.093

0.084

 

0.059

0.053

0.048

0.043

0.039

0.035

0.032

0.027

0.038

0.051

0.066

0.079

0.088

0.093

0.094

0.091

0.086

0.080

0.073

0.066

0.060

0.054

0.048

0.044

0.039

0.036

0.032

 

0.023

0.021

0.018

0.017

0.015

0.014

0.012

0.023

0.027

0.032

0.035

0.038

0.040

0.040

0.039

0.038

0.035

0.032

0.029

0.026

0.024

0.021

0.019

0.017

0.016

0.014

0.013

 

0.301

0.271

0.244

0.219

0.197

0.177

0.160

0.081

0.137

0.210

0.290

0.363

0.418

0.450

0.459

0.450

0.429

0.400

0.367

0.334

0.301

0.271

0.243

0.219

0.196

0.177

0.159

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

 

0.013

0.012

0.011

0.009

0.008

0.008

0.007

0.011

0.013

0.016

0.019

0.021

0.022

0.022

0.022

0.021

0.020

0.018

0.016

0.015

0.013

0.012

0.011

0.010

0.009

0.008

0.007

 

0.187

0.168

0.151

0.135

0.122

0.109

0.099

0.054

0.088

0.133

0.182

0.226

0.259

0.278

0.283

0.276

0.263

0.244

0.224

0.203

0.183

0.165

0.148

0.132

0.119

0.107

0.096

 

0.410

0.370

0.332

0.298

0.268

0.241

0.217

0.102

0.178

0.277

0.386

0.485

0.561

0.604

0.617

0.604

0.575

0.535

0.491

0.446

0.402

0.362

0.324

0.291

0.261

0.235

0.212

 

0.457

0.411

0.370

0.332

0.298

0.268

0.242

0.122

0.207

0.317

0.436

0.546

0.628

0.675

0.688

0.674

0.640

0.596

0.547

0.496

0.448

0.402

0.361

0.324

0.291

0.262

0.236

 

0.324

0.292

0.262

0.236

0.212

0.191

0.172

0.100

0.160

0.237

0.320

0.396

0.452

0.484

0.492

0.481

0.457

0.425

0.389

0.353

0.319

0.286

0.257

0.231

0.207

0.187

0.169

 

0.160

0.145

0.130

0.117

0.105

0.095

0.086

0.051

0.081

0.119

0.160

0.197

0.225

0.240

0.244

0.238

0.226

0.210

0.193

0.175

0.158

0.142

0.128

0.115

0.103

0.093

0.084

 

0.060

0.054

0.049

0.044

0.040

0.036

0.033

0.027

0.038

0.052

0.066

0.079

0.089

0.093

0.094

0.091

0.086

0.080

0.073

0.066

0.060

0.054

0.048

0.044

0.040

0.036

0.033

 

0.023

0.021

0.019

0.017

0.015

0.014

0.013

0.023

0.028

0.032

0.036

0.039

0.040

0.041

0.040

0.038

0.035

0.032

0.029

0.026

0.024

0.021

0.019

0.017

0.016

0.014

0.013

 

0.306

0.276

0.249

0.223

0.201

0.181

0.163

0.082

0.140

0.213

0.293

0.366

0.420

0.451

0.459

0.450

0.428

0.399

0.366

0.333

0.301

0.271

0.243

0.218

0.196

0.177

0.160

 

338

CHEMOMETRICS

 

 

useful. Produce a 10 × 10 table of summed intensities over the 20 chromatographic points in time at each wavelength for each sample.

2.Standardise the data, and perform autopredictive PLS1, calculating three PLS components. Why is it useful to standardise the measurements?

3.Plot graphs of predicted versus known concentrations for one, two and three PLS components, and calculate the root mean square errors in mM.

4.Perform PLS1 cross-validation on the c values for the first eight components and plot a graph of cross-validated error against component number.

5.Unfold the original datamatrix to give a 10 × 200 data matrix.

6.It is desired to perform PLS calibration on this dataset, but first to standardise the data. Explain why there may be problems with this approach. Why is it desirable to reduce the number of variables from 200, and why was this variable selection less important in the PLS1 calculations?

7.Why is the standard deviation a good measure of variable significance? Reduce the

dataset to 100 significant variables with the highest standard deviations to give a 10 × 100 data matrix.

8.Perform autopredictive PLS1 on the standardised reduced unfolded data of question 7 and calculate the errors as one, two and three components are computed.

9.How might you improve the model of question 8 still further?

Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

Richard G. Brereton

Copyright 2003 John Wiley & Sons, Ltd.

ISBNs: 0-471-48977-8 (HB); 0-471-48978-6 (PB)

6 Evolutionary Signals

6.1 Introduction

Some of the classical applications of chemometrics are to evolutionary data. Such a type of information is increasingly common, and normally involves simultaneously recording spectra whilst a physical parameter such as time or pH is changed, and signals evolve during the change of this parameter.

In the modern laboratory, one of the most widespread applications is in the area of coupled chromatography, such as HPLC–DAD (high-performance liquid chromatogra- phy–diode array detector), LC–MS (liquid chromatography–mass spectrometry) and LC–NMR (liquid chromatography–nuclear magnetic resonance). A chromatogram is recorded whilst a UV/vis, mass or NMR spectrum is recorded. The information can be presented in matrix form, with time along the rows and wavelength, mass number or frequency along the columns, as already introduced in Chapter 4. Multivariate approaches can be employed to analyse these data. However, there are a wide variety of other applications, ranging from pH titrations to processes that change in a systematic way with time to spectroscopy of mixtures. Many of the approaches in this chapter have wide applicability, for example, baseline correction, data scaling and 3D PC plots, but for brevity we illustrate the chapter primarily with case studies from coupled chromatography, as this has been the source of a huge literature over the past two decades.

With modern laboratory computers it is possible to obtain huge quantities of information very rapidly. For example, spectra sampled at 1 nm intervals over a 200 nm region can be obtained every second using modern chromatography, hence in an hour 3600 spectra in time × 200 spectral frequencies or 720 000 pieces of information can be produced from a single chromatogram. A typical medium to large industrial site may contain 100 or more coupled chromatographs, meaning the potential of acquiring 72 million data-points per hour of this type of information. Add on all the other instruments, and it is not difficult to see how billions of numbers can be generated on a daily basis.

In Chapters 4 and 5, we discussed a number of methods for multivariate data analysis, but the methods described did not take into account the sequential nature of the information. When performing PCA on a data matrix, the order of the rows and columns is irrelevant. Figure 6.1 represents three cross-sections through a data matrix. The first could correspond to a chromatographic peak, the others not. However, since PCA and most other classical methods for pattern recognition would not distinguish these sequences, clearly other approaches are useful.

In many cases, underlying factors corresponding to individual compounds in a mixture are unimodal in time, that this, they have one maximum. The aim is to deconvolute the experimentally observed sum into individual components and determine the features of each component. The change in spectral characteristics across the chromatographic peak can be used to provide this information.

340

CHEMOMETRICS

Figure 6.1

Three possible sequential patterns that would be treated identically using standard multivariate techniques

To the practising chemist, there are three main questions that can be answered by applied chemometric techniques to coupled chromatography, of increasing difficulty.

1.How many peaks in a cluster? Can we detect small impurities? Can we detect metabolites against a background? Can we determine whether there are embedded peaks?

2.What are the characteristics of each pure compound? What are the spectra? Can we obtain mass spectra and NMR spectra of embedded chromatographic peaks at low levels of sufficient quality that we can be confident of their identities?

EVOLUTIONARY SIGNALS

341

 

 

3.What are the quantities of each component? Can we quantitate small impurities? Could we use chromatography of mixtures for reaction monitoring and kinetics? Can we say with certainty the level of a dopant or a potential environmental pollutant when it is detected in low concentrations buried within a major peak?

There are a large number of ‘named’ methods in the literature, but they are based mainly around certain main principles of evolutionary factor analysis, whereby factors corresponding to individual compounds evolve in time (or any other sequential parameter such as pH).

Such methods are applicable not only to coupled chromatography but also in areas such as pH dependence of equilibria, whereby the spectra of a mixture of chemical species can be followed with change of pH. It would be possible to record 20 spectra and then treat each independently. Sometimes this can lead to good quantification, but including the information that each component will be unimodal or monotonic over the course of a pH titration results in further insight. Another important application is in industrial process control where concentrations of compounds or levels of various factors may have a specific evolution over time.

Below we will illustrate the main methods of resolution of two-way data, primarily as applied to HPLC–DAD, but also comment on the specific enhancements required for other instrumental techniques and applications. Some techniques have already been introduced in Chapters 4 and 5, but we elaborate on them in this chapter.

A few of the methods discussed in this chapter, such as 3D PC plots and variable selection, have significant roles in most applications of chemometrics, so the interest in the techniques is by no means restricted to chromatographic applications, but in order to reduce excessive repetition the methods are introduced in one main context.

6.2 Exploratory Data Analysis And Preprocessing

6.2.1 Baseline Correction

A preliminary first step before applying most methods in this chapter is often baseline correction, especially when using older instruments. The reason for this is that most chemometric approaches look at variation above a baseline, so if baseline correction is not done artefacts can be introduced.

Baseline correction is best performed on each variable (such as mass or wavelength) independently. There are many ways of doing this, but it is first important to identify regions of baseline and of peaks, as in Figure 6.2 which is for an LC–MS dataset. Note that the right-hand side of this tailing peak is not used: we take only regions that we are confident in. Then normally a function is fitted to the baseline regions. This can simply involve the average or else a linear or polynomial best fit. Sometimes the baseline both before and after a peak cluster is useful, but if the cluster is fairly sharp, this is not essential, and in the case illustrated it would be hard. Sometimes the baseline is calculated over the entire chromatogram, in other cases separately for each pack cluster. After that, we obtain a simple mathematical model, and then subtract the baseline from the entire region of interest, separately for each variable. In the examples in this chapter it is assumed that either there are no baseline problems or that correction has already been performed.

Соседние файлы в предмете Химия