Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

402

Problem 6.3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

1002

236

219

67

104

739

463

650

385

163

754

925

3313

194

96

92

2926

8036

872

92

112

249

166

300

467

281

1113

160

113

524

2813

783

221

219

108

378

173

115

874

1388

88

259

147

206

773

305

54

159

49

1146

638

354

76

128

623

173

568

403

265

1028

123

2996

100

66

38

3073

10484

1439

116

128

443

184

192

525

261

880

168

83

712

4982

268

87

186

0

245

157

141

765

1434

11

337

129

163

679

217

11

249

29

527

0

157

64

208

658

364

479

320

103

132

901

4279

158

115

40

2416

7375

704

99

76

170

125

350

163

235

0

154

103

325

2472

768

143

213

25

262

164

133

967

1250

51

320

112

140

920

223

88

220

25

919

288

158

34

135

299

194

364

436

199

726

340

3703

139

131

187

3468

7432

828

116

10

233

130

300

149

299

758

135

104

228

3323

303

184

177

39

363

196

196

779

1375

83

390

189

180

838

318

57

140

38

1146

191

141

24

65

41

179

612

430

152

584

125

3714

139

121

198

2273

7258

965

132

76

89

149

164

124

149

1050

147

116

254

3904

458

172

185

19

351

179

87

796

1408

54

377

220

180

815

206

45

20

64

1173

335

91

8

107

558

106

324

514

212

882

811

4122

135

86

173

4273

11437

1322

117

85

280

205

310

291

211

331

89

120

301

4391

574

146

148

27

304

197

106

1044

1427

50

301

101

253

1005

302

0

166

29

1207

284

102

46

110

375

70

494

425

180

580

639

4836

113

143

125

3103

9768

1250

130

49

337

187

250

431

172

766

138

69

491

2687

437

166

128

27

320

149

103

907

1708

61

238

161

183

1122

204

96

166

37

1181

239

110

68

158

803

310

632

263

84

1136

759

4824

134

50

59

5535

9069

1201

84

0

160

131

292

28

269

914

184

43

296

1753

518

133

86

76

343

110

112

1046

1720

91

314

184

218

1310

221

71

150

70

1402

315

137

91

77

234

98

600

438

72

608

473

5054

102

92

117

5025

11006

1191

102

6

282

117

263

345

161

849

149

93

209

2322

681

176

138

50

217

100

129

1224

1909

69

172

137

202

1183

98

52

147

48

1571

67

180

73

134

299

252

771

234

196

128

449

4688

128

42

121

4203

11459

1624

69

69

335

127

238

340

199

740

122

40

504

3113

74

66

57

10

123

52

104

1200

2036

52

155

124

173

1447

221

63

146

122

1492

72

265

7

132

537

225

749

258

268

574

665

3225

110

61

115

4515

11816

1766

81

43

496

133

254

302

109

645

9

89

377

1717

278

42

85

13

110

81

60

1566

1819

63

147

98

139

1467

180

11

246

46

1148

106

262

63

99

240

322

594

125

160

506

906

4396

111

9

104

4550

12833

1528

85

7

317

91

231

430

127

673

83

57

548

2470

151

111

90

91

178

65

86

1358

2031

74

112

19

179

1657

95

47

184

35

1668

283

162

75

69

0

224

648

292

105

56

301

4476

15

5

61

4648

12832

1717

46

96

333

129

98

296

44

770

98

65

512

2492

0

54

47

28

53

28

43

1444

1886

42

180

58

123

1390

64

15

140

8

1770

286

517

59

98

222

363

523

131

124

325

323

4564

46

15

17

5014

13212

1426

41

138

318

54

43

348

35

868

9

48

372

1531

352

27

32

57

151

11

35

1478

2232

57

89

35

105

1328

78

124

211

17

1678

34

325

88

13

379

180

550

57

0

173

592

3830

36

0

0

4817

9457

1507

46

36

687

61

91

552

18

507

0

0

422

1527

606

0

0

71

0

13

19

1678

1929

43

29

0

107

1567

51

115

257

0

1860

176

455

83

0

39

130

661

108

104

405

294

4479

0

5

25

4578

12223

1613

59

29

396

61

0

294

16

884

11

27

524

2608

190

11

14

66

95

4

0

1737

2259

28

0

68

0

1575

6

145

206

45

1423

162

341

52

45

499

206

722

0

109

398

758

2633

40

2

5

4589

12855

1515

57

18

380

0

86

50

0

710

36

5

436

892

485

21

40

96

51

0

33

1806

2013

46

75

21

148

1321

20

172

236

5

1694

164

328

39

35

139

162

1260

124

118

130

289

2733

47

33

23

5060

14185

1437

56

47

407

70

122

416

117

906

53

40

359

3372

412

22

74

9

38

62

27

2027

1988

5

77

73

141

1261

37

202

151

26

1831

69

158

3

30

19

0

1238

233

16

0

41

1713

81

72

45

4551

11352

1170

0

67

203

114

214

213

115

766

25

43

0

485

557

145

151

435

269

104

119

2111

1900

27

160

85

43

1141

0

71

0

63

1620

119

0

0

110

558

131

1530

259

278

453

54

671

126

48

92

5536

13709

1309

104

105

184

90

231

110

134

538

85

90

103

1492

72

57

139

13

243

138

104

2625

1904

41

282

93

132

876

159

28

50

33

1917

229

178

39

11

43

14

1626

550

314

498

0

271

107

68

40

3545

13004

1489

83

70

0

191

122

0

166

793

57

82

235

1256

123

71

237

45

256

158

73

2516

1670

0

286

163

92

633

164

82

77

44

1286

245

188

73

88

706

375

919

367

153

589

1094

0

120

94

46

3216

9371

1123

112

97

212

108

339

206

155

669

135

44

249

0

68

184

189

107

332

124

104

1782

1823

54

196

207

229

345

195

125

115

39

905

386

353

60

107

593

515

770

440

427

1114

622

73

98

89

139

2172

5742

377

121

198

448

247

295

357

168

675

102

91

555

4081

222

15

149

194

187

191

133

1225

1179

45

231

132

206

137

184

160

250

54

334

578

567

111

83

1143

278

430

562

135

1256

1150

256

123

59

108

721

2246

631

113

289

109

212

284

855

177

861

223

76

598

3865

234

152

262

295

304

190

80

497

423

19

396

146

207

0

187

415

178

97

0

530

242

66

165

1111

360

0

594

287

962

1225

35

246

131

188

0

0

0

132

221

470

274

527

484

263

663

217

174

441

2452

1118

216

286

283

553

360

204

0

0

101

547

269

334

58

380

200

109

92

41

42

44

50

54

59

61

68

69

72

76

77

78

82

86

94

95

96

97

104

108

109

110

118

119

126

127

128

131

135

136

137

138

140

141

142

144

149

154

155

158

167

168

169

172

195

196

197

198

EVOLUTIONARY SIGNALS

 

 

 

 

 

 

 

 

403

 

 

 

 

 

 

 

 

(continued from p. 401)

 

 

 

 

 

 

 

0.159

0.281

0.431

0.192

0.488

0.335

0.196

0.404

0.356

0.265

0.076

0.341

0.629

0.294

0.507

0.442

0.252

0.592

0.352

0.196

0.138

0.581

0.883

0.351

0.771

0.714

0.366

0.805

0.548

0.220

0.223

0.794

1.198

0.543

0.968

0.993

0.494

1.239

0.766

0.216

0.367

0.865

1.439

0.562

1.118

1.130

0.578

1.488

0.837

0.220

0.310

0.995

1.505

0.572

1.188

1.222

0.558

1.550

0.958

0.276

0.355

0.895

1.413

0.509

1.113

1.108

0.664

1.423

0.914

0.308

0.284

0.723

1.255

0.501

0.957

0.951

0.520

1.194

0.778

0.219

0.350

0.593

0.948

0.478

0.738

0.793

0.459

0.904

0.648

0.177

0.383

0.409

0.674

0.454

0.555

0.629

0.469

0.684

0.573

0.126

0.488

0.220

0.620

0.509

0.494

0.554

0.580

0.528

0.574

0.165

0.695

0.200

0.492

0.551

0.346

0.454

0.695

0.426

0.584

0.177

0.877

0.220

0.569

0.565

0.477

0.582

0.747

0.346

0.685

0.168

0.785

0.230

0.486

0.724

0.346

0.601

0.810

0.370

0.748

0.147

0.773

0.204

0.435

0.544

0.321

0.442

0.764

0.239

0.587

0.152

0.604

0.141

0.417

0.504

0.373

0.458

0.540

0.183

0.504

0.073

0.493

0.083

0.302

0.359

0.151

0.246

0.449

0.218

0.392

0.110

0.291

0.050

0.096

0.257

0.034

0.199

0.238

0.142

0.271

0.018

0.204

0.034

0.126

0.097

0.092

0.095

0.215

0.050

0.145

0.034

The aim of this problem is to explore different approaches to signal resolution using a variety of common chemometric methods.

1.Plot a graph of the sum of intensities at each point in time. Verify that it looks as if there are three peaks in the data.

2.Calculate the derivative of the spectrum, scaled at each point in time to a constant sum, and at each wavelength as follows.

a.Rescale the spectrum at each point in time by dividing by the total intensity at that point in time so that the total intensity at each point in time equals 1.

b.Then calculate the smoothed five point quadratic Savitsky–Golay first derivatives as presented in Chapter 3, Table 3.6, independently for each of the 10 wavelengths. A table consisting of derivatives at 26 times and 10 wavelengths should be obtained.

c.Superimpose the 10 graphs of derivatives at each wavelength.

3.Summarise the change in derivative with time by calculating the mean of the abso-

lute value of the derivative over all 10 wavelengths at each point in time. Plot a graph of this, and explain why a value close to zero indicates a good pure or composition 1 point in time. Show that this suggests that points 6, 17 and 26 are good estimates of pure spectra for each component.

4.The concentration profiles of each component can be estimated using MLR as follows. a. Obtain estimates of the spectra of each pure component at the three points of

ˆ

highest purity, to give an estimated spectral matrix S .

ˆ = ˆ ˆ ˆ 1 b. Using MLR calculate C X .S .(S .S ) .

c. Plot a graph of the predicted concentration profiles

5.An alternative method is PCR. Perform uncentred PCA on the raw data matrix X and verify that there are approximately three components.

404

 

 

CHEMOMETRICS

 

 

6. Using estimates of each pure component given

in question 4(a), perform PCR

as follows.

 

1

 

ˆ

 

.P where P is the loadings

a. Using regression find the matrix R for which S = R

 

matrix obtained in question 5; keep three PCs only.

 

 

 

ˆ

b. Estimate the elution profiles of all three peaks since C T .R.

c. Plot these graphically.

 

 

 

Problem 6.5 Titration of Three Spectroscopically Active Compounds with pH

Section 6.2.2

Section 6.3.3 Section 6.4.1.2

The data in the table on page 405 represent the spectra of a mixture of three spectroscopically active species recorded at 25 wavelengths over 36 different values of pH.

1.Perform PCA on the raw uncentred data, and obtain the scores and loadings for the first three PCs.

2.Plot a graph of the loadings of the first PC and superimpose this on the graph of the average spectrum over all the observed pHs, scaling the two graphs so that they are of approximately similar size. Comment on why the first PC is not very useful for discriminating between the compounds.

3.Calculate the logarithm of the correlation coefficient between each successive spectrum, and plot this against pH (there will be 35 numbers; plot the logarithm of the correlation between the spectra at pH 2.15 and 2.24 against the lower pH). Show how this is consistent with there being three different spectroscopic species in the mixture. On the basis of three components, are there pure regions for each components, and over which pH ranges are these?

4.Centre the data and produce three scores plots, those of PC2 vs PC1, PC3 vs PC1 and PC3 vs PC2. Label each point with pH (Excel users will have to adapt the macro provided). Comment on these plots, especially in the light of the correlation graph in question 3.

5.Normalise the scores of the first two PCs obtained in question 4 by dividing by the square root of the sum of squares at each pH. Plot the graph of the normalised scores of PC2 vs PC1, labelling each point as in question 4, and comment.

6.Using the information above, choose one pH which best represents the spectra for each of the three compounds (there may be several answers to this, but they should not differ by a great deal). Plot the spectra of each pure compound, superimposed on one another.

7.Using the guesses of the spectra for each compound in question 7, perform MLR

ˆ = 1

to obtain estimated profiles for each species by C X .S .(S .S ) . Plot a graph of the pH profiles of each species.

Problem 6.6 Resolution of Mid-infrared Spectra of a Three-component Mixture

Section 6.2.2 Section 6.2.3.1 Section 6.4.1.2

The table on page 406 represents seven spectra consisting of different mixtures of three compounds, 1,2,3-trimethylbenzene, 1,3,5-trimethylbenzene and toluene, whose midinfrared spectra have been recorded at 16 cm1 intervals between 528 and 2000 nm, which you will need to reorganise as a matrix of dimensions 7 × 93.

EVOLUTIONARY SIGNALS

405

 

 

Problem 6.5

 

336

0.000

0.000

0.000

0.001

0.001

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.002

0.003

0.003

0.003

0.004

0.004

0.005

0.004

0.005

0.005

0.005

0.007

0.006

0.006

0.006

0.006

0.007

0.007

0.007

0.008

0.008

 

332

0.000

0.000

0.001

0.001

0.001

0.002

0.002

0.002

0.001

0.001

0.002

0.002

0.002

0.002

0.002

0.001

0.003

0.003

0.003

0.003

0.005

0.004

0.004

0.005

0.004

0.005

0.007

0.006

0.006

0.006

0.007

0.007

0.007

0.008

0.008

0.008

 

328

0.000

0.001

0.001

0.001

0.001

0.003

0.003

0.002

0.002

0.002

0.002

0.002

0.002

0.003

0.003

0.003

0.003

0.004

0.004

0.004

0.006

0.005

0.005

0.005

0.006

0.005

0.006

0.006

0.006

0.007

0.007

0.007

0.007

0.008

0.008

0.008

 

324

0.001

0.001

0.001

0.002

0.001

0.003

0.003

0.003

0.003

0.003

0.002

0.002

0.003

0.003

0.003

0.003

0.003

0.003

0.004

0.004

0.006

0.005

0.005

0.006

0.006

0.006

0.006

0.007

0.007

0.007

0.007

0.008

0.009

0.008

0.008

0.009

 

320

0.002

0.002

0.002

0.003

0.002

0.005

0.005

0.004

0.003

0.003

0.003

0.003

0.004

0.004

0.003

0.004

0.004

0.004

0.005

0.005

0.006

0.005

0.005

0.006

0.006

0.007

0.007

0.007

0.007

0.008

0.008

0.008

0.009

0.009

0.009

0.009

 

316

0.005

0.005

0.004

0.005

0.004

0.005

0.005

0.005

0.004

0.004

0.004

0.004

0.004

0.005

0.005

0.004

0.004

0.005

0.005

0.005

0.006

0.006

0.006

0.006

0.006

0.006

0.007

0.007

0.007

0.008

0.008

0.008

0.008

0.008

0.009

0.010

 

312

0.006

0.006

0.005

0.006

0.006

0.007

0.007

0.007

0.005

0.005

0.005

0.005

0.005

0.004

0.004

0.004

0.005

0.006

0.005

0.006

0.006

0.006

0.007

0.007

0.007

0.007

0.008

0.010

0.008

0.010

0.009

0.009

0.009

0.009

0.012

0.010

 

308

0.013

0.012

0.011

0.012

0.011

0.011

0.011

0.010

0.009

0.008

0.007

0.007

0.007

0.007

0.007

0.006

0.007

0.008

0.007

0.010

0.008

0.008

0.008

0.009

0.009

0.009

0.010

0.011

0.010

0.011

0.010

0.012

0.012

0.012

0.013

0.012

 

304

0.026

0.025

0.023

0.023

0.021

0.021

0.019

0.018

0.015

0.013

0.010

0.009

0.009

0.009

0.008

0.009

0.008

0.008

0.009

0.013

0.010

0.010

0.010

0.011

0.011

0.011

0.012

0.013

0.012

0.013

0.013

0.013

0.013

0.014

0.015

0.015

 

300

0.056

0.054

0.051

0.048

0.045

0.042

0.039

0.034

0.028

0.023

0.018

0.016

0.014

0.013

0.013

0.012

0.013

0.013

0.013

0.015

0.014

0.015

0.014

0.015

0.015

0.016

0.017

0.016

0.016

0.015

0.016

0.015

0.016

0.017

0.016

0.017

 

296

0.104

0.102

0.098

0.091

0.084

0.079

0.072

0.064

0.053

0.044

0.037

0.032

0.028

0.026

0.026

0.025

0.025

0.026

0.025

0.026

0.027

0.027

0.027

0.028

0.027

0.027

0.028

0.026

0.025

0.021

0.020

0.021

0.020

0.020

0.021

0.022

 

292

0.166

0.162

0.156

0.147

0.137

0.129

0.119

0.107

0.092

0.079

0.069

0.063

0.058

0.056

0.055

0.055

0.054

0.055

0.055

0.058

0.056

0.057

0.057

0.056

0.055

0.055

0.054

0.050

0.047

0.040

0.037

0.037

0.035

0.035

0.036

0.036

 

288

0.234

0.229

0.223

0.214

0.202

0.192

0.180

0.167

0.149

0.135

0.124

0.116

0.110

0.109

0.109

0.107

0.107

0.108

0.108

0.108

0.109

0.110

0.110

0.110

0.108

0.107

0.104

0.100

0.094

0.085

0.081

0.081

0.079

0.078

0.077

0.079

 

284

0.306

0.302

0.296

0.287

0.274

0.263

0.253

0.238

0.218

0.203

0.192

0.185

0.180

0.179

0.178

0.177

0.178

0.178

0.179

0.179

0.181

0.181

0.183

0.181

0.179

0.178

0.175

0.172

0.169

0.163

0.160

0.158

0.157

0.156

0.156

0.157

 

280

0.379

0.376

0.372

0.364

0.352

0.343

0.331

0.317

0.300

0.285

0.274

0.269

0.264

0.262

0.263

0.262

0.263

0.264

0.264

0.266

0.268

0.269

0.270

0.270

0.270

0.270

0.269

0.270

0.270

0.270

0.270

0.270

0.271

0.270

0.270

0.271

 

276

0.449

0.448

0.444

0.437

0.428

0.421

0.410

0.397

0.382

0.371

0.361

0.356

0.354

0.353

0.352

0.353

0.354

0.355

0.356

0.358

0.361

0.362

0.362

0.364

0.366

0.368

0.371

0.375

0.380

0.390

0.392

0.395

0.396

0.399

0.399

0.399

 

272

0.520

0.518

0.516

0.508

0.501

0.496

0.487

0.477

0.464

0.457

0.449

0.447

0.445

0.446

0.446

0.445

0.447

0.449

0.450

0.451

0.456

0.456

0.457

0.458

0.462

0.469

0.471

0.478

0.487

0.503

0.508

0.510

0.514

0.516

0.517

0.517

 

268

0.584

0.582

0.578

0.574

0.567

0.566

0.559

0.550

0.542

0.537

0.531

0.532

0.531

0.532

0.533

0.534

0.536

0.538

0.538

0.539

0.545

0.546

0.547

0.548

0.553

0.557

0.563

0.570

0.580

0.598

0.604

0.608

0.612

0.614

0.614

0.615

 

264

0.632

0.630

0.626

0.623

0.616

0.617

0.613

0.607

0.602

0.601

0.599

0.601

0.601

0.602

0.605

0.606

0.608

0.611

0.611

0.613

0.618

0.619

0.620

0.622

0.628

0.627

0.633

0.640

0.649

0.664

0.670

0.674

0.678

0.679

0.680

0.680

 

260

0.663

0.661

0.657

0.655

0.650

0.653

0.652

0.650

0.648

0.650

0.649

0.651

0.654

0.656

0.658

0.660

0.662

0.665

0.667

0.668

0.673

0.676

0.674

0.673

0.677

0.677

0.680

0.684

0.688

0.699

0.701

0.704

0.707

0.708

0.709

0.708

 

256

0.673

0.672

0.671

0.669

0.668

0.674

0.673

0.670

0.672

0.675

0.676

0.680

0.683

0.685

0.687

0.690

0.692

0.696

0.697

0.698

0.703

0.704

0.705

0.702

0.702

0.702

0.701

0.702

0.702

0.707

0.708

0.709

0.710

0.712

0.712

0.710

 

252

0.638

0.639

0.640

0.642

0.643

0.651

0.652

0.651

0.653

0.658

0.661

0.666

0.670

0.671

0.675

0.677

0.680

0.683

0.684

0.685

0.691

0.692

0.692

0.689

0.688

0.685

0.684

0.682

0.680

0.680

0.681

0.680

0.680

0.680

0.681

0.680

Wavelength(nm)

240 244 248

0.382 0.479 0.571

0.386 0.482 0.573

0.391 0.488 0.576

0.402 0.496 0.582

0.409 0.503 0.586

0.424 0.515 0.597

0.430 0.519 0.599

0.435 0.522 0.599

0.444 0.528 0.604

0.454 0.537 0.611

0.462 0.544 0.615

0.470 0.550 0.623

0.476 0.555 0.626

0.480 0.559 0.628

0.483 0.562 0.631

0.484 0.566 0.635

0.487 0.566 0.636

0.490 0.570 0.639

0.492 0.571 0.641

0.494 0.572 0.642

0.500 0.578 0.647

0.501 0.580 0.648

0.503 0.580 0.649

0.503 0.579 0.645

0.506 0.578 0.644

0.509 0.578 0.642

0.511 0.576 0.639

0.515 0.575 0.638

0.516 0.573 0.634

0.520 0.573 0.632

0.522 0.573 0.631

0.523 0.573 0.631

0.524 0.573 0.631

0.526 0.573 0.631

0.527 0.575 0.631

0.529 0.576 0.631

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pH

2.15

2.24

2.44

2.68

3.00

3.25

3.47

3.72

4.04

4.40

4.77

5.06

5.40

5.68

5.98

6.25

6.49

6.85

7.00

7.47

7.75

7.96

8.12

8.51

8.82

9.11

9.38

9.61

9.89

10.38

10.57

10.74

11.01

11.27

11.47

11.64

406

CHEMOMETRICS

 

 

1 cmin

alongSamples wavelengthrows,

6.6Problem

1616

5859

6503

5062

6563

3027

7995

4825

1216

363

201

316

314

448

465

387

816

684

663

502

674

409

910

495

 

 

 

 

 

 

 

 

1632

840

688

738

834

756

1110

811

1232

339

181

270

277

406

431

333

832

2919

3472

2182

3198

951

3987

1891

 

 

 

 

 

 

 

 

1648

488

271

325

370

525

617

399

1248

432

225

319

338

507

546

397

848

1556

1802

1180

1691

582

2120

1049

 

 

 

 

 

 

 

 

1664

378

224

265

300

399

481

319

1264

397

223

267

304

426

503

326

864

274

186

229

248

287

355

266

 

 

 

 

 

 

 

 

1680

343

221

285

304

374

444

337

1280

356

214

248

283

371

454

298

880

290

186

260

267

331

376

308

 

 

 

 

 

 

 

 

1696

359

252

303

330

368

468

349

1296

382

248

284

322

387

490

333

896

382

286

351

375

391

503

399

 

 

 

 

 

 

 

 

1712

647

521

461

582

513

844

497

1312

434

272

326

363

456

557

387

912

420

316

362

401

408

551

408

 

 

 

 

 

 

 

 

1728

472

318

379

418

485

611

440

1328

457

294

332

380

461

586

390

928

553

466

468

546

468

732

506

528

598

366

658

606

798

784

791

1744

617

514

487

588

502

812

524

1344

585

432

432

516

524

759

484

944

343

228

289

310

367

444

339

544

625

322

538

527

797

797

670

1760

657

581

475

618

459

866

491

1360

771

605

600

713

669

1008

661

960

410

215

300

320

479

519

374

560

554

265

451

444

713

701

571

1776

425

215

336

341

524

539

421

1376

1453

1117

1141

1337

1299

1898

1268

976

718

353

447

507

808

898

568

576

347

210

357

338

447

453

430

1792

426

222

495

428

639

556

612

1392

1810

1245

1377

1575

1771

2338

1586

992

959

558

578

707

952

1209

698

592

287

149

288

264

395

370

357

1808

398

218

472

409

590

521

579

1408

1153

853

866

1025

1042

1497

971

1008

936

595

585

725

881

1190

686

608

258

163

239

240

304

335

285

1824

312

197

272

282

355

404

324

1424

1090

720

879

961

1143

1409

1029

1024

887

507

982

885

1235

1159

1196

624

230

134

216

211

287

297

262

1840

447

237

368

371

549

570

455

1440

2429

1515

1755

1992

2505

3106

2080

1040

988

751

1143

1099

1174

1323

1313

640

240

155

222

225

277

312

263

1856

421

224

412

384

566

542

508

1456

4086

2635

2862

3343

4036

5229

3354

1056

839

650

830

865

873

1112

938

656

264

169

252

251

313

344

299

1872

382

247

424

395

495

503

504

1472

5282

2844

3345

3874

5666

6642

4140

1072

972

506

994

904

1355

1255

1232

672

1091

972

1376

1340

1200

1493

1529

1888

354

246

265

307

340

458

304

1488

5298

3195

4358

4573

6011

6806

5233

1088

1462

579

1151

1094

2007

1828

1509

688

2593

2363

3207

3175

2737

3550

3533

1904

456

259

285

339

469

576

347

1504

3339

2599

3724

3669

3785

4468

4243

1104

1091

434

696

732

1371

1349

927

704

2229

1071

3565

2710

4201

2982

4432

1920

531

301

374

416

578

673

455

1520

2318

2051

2006

2356

1869

3085

2134

1120

390

223

334

338

468

500

406

720

2709

1089

5042

3568

5921

3659

6342

1936

422

262

484

440

574

556

581

1536

1956

1739

1662

1976

1543

2602

1759

1136

329

207

258

282

353

424

306

736

2742

1084

4946

3520

5896

3686

6235

1952

381

241

489

427

552

508

587

1552

1617

1412

1281

1573

1237

2139

1352

1152

356

213

271

295

389

455

326

752

3372

1130

2592

2396

4846

4180

3492

1968

256

167

258

252

310

336

305

1568

1507

1282

1163

1433

1173

1985

1236

1168

384

216

290

310

435

488

354

768

4468

1404

1924

2336

5384

5384

2821

1984

214

145

169

189

217

277

197

1584

1936

1407

1355

1657

1707

2501

1520

1184

352

194

270

285

407

448

331

784

1238

452

625

725

1482

1510

867

2000

211

143

164

184

211

272

190

1600

4251

4053

3587

4408

2983

5693

3678

1200

372

218

314

322

437

478

381

800

296

171

220

240

327

377

267

 

1 2 3 4 5 6 7

 

1 2 3 4 5 6 7

 

1 2 3 4 5 6 7

 

1 2 3 4 5 6 7

EVOLUTIONARY SIGNALS

407

 

 

1.Scale the data so that the sum of the spectral intensities at each wavelength equals 1 (note that this differs from the usual method which is along the rows, and is a way of putting equal weight on each wavelength). Perform PCA, without further preprocessing, and produce a plot of the loadings of PC2 vs PC1.

2.Many wavelengths are not very useful if they are low intensity. Identify those wavelengths for which the sum over all seven spectra is greater than 10 % of the wavelength that has the maximum sum, and label these in the graph in question 1.

3.Comment on the appearance of the graph in question 2, and suggest three wavelengths that are typical of each of the compounds.

4.Using the three wavelengths selected in question 3, obtain a 7 × 3 matrix of relative

 

ˆ

concentrations in each of the spectra and call this C .

5. Calling the original data

X, obtain the estimated spectra for each compound by

S = (Cˆ .Cˆ )1.Cˆ .X and

plot these graphically.

Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

Richard G. Brereton

Copyright 2003 John Wiley & Sons, Ltd.

ISBNs: 0-471-48977-8 (HB); 0-471-48978-6 (PB)

Appendices

A.1 Vectors and Matrices

A.1.1 Notation and Definitions

A single number is often called a scalar, and is represented by italics, e.g. x.

A vector consists of a row or column of numbers and is represented by bold lower case italics, e.g. x. For example, x = 3 11 9 0 is a row vector and

5.6 y = 2.8

1.9

is a column vector.

A matrix is a two-dimensional array of numbers and is represented by bold upper case italics e.g. X. For example,

 

 

 

3

 

X =

12

8

2

14

1

 

 

 

 

is a matrix.

The dimensions of a matrix are normally presented with the number of rows first and the number of columns second, and vectors can be considered as matrices with one dimension equal to 1, so that x above has dimensions 1 × 4 and X has dimensions 2 × 3.

A square matrix is one where the number of columns equals the number of rows. For example,

 

 

7

4

1

 

Y

11

3

6

 

=

2

4

12

 

is a square matrix.

An identity matrix is a square matrix whose elements are equal to 1 in the diagonal and 0 elsewhere, and is often denoted by I. For example,

 

 

 

I =

1

0

0

1

is an identity matrix.

The individual elements of a matrix are often referenced as scalars, with subscripts referring to the row and column; hence, in the matrix above, y21 = 11, which is the element in row 2 and column 1. Optionally, a comma can be placed between the subscripts for clarity; this is useful if one of the dimensions exceeds 9.

410

CHEMOMETRICS

 

 

A.1.2 Matrix and Vector Operations

A.1.2.1 Addition and Subtraction

Addition and subtraction is the most straightforward operation. Each matrix (or vector) must have the same dimensions, and simply involves performing the operation element by element. Hence

 

8

4

11

3

 

 

19

7

 

9

7

0

7

 

 

9

0

2

4 + − 5

6

= − 3

10

A.1.2.2 Transpose

Transposing a matrix involves swapping the columns and rows around, and may be denoted by a right-hand-side superscript ( ). For example, if

 

 

 

 

 

 

 

Z =

3.1 0.2 6.1 4.8

9.2 3.8 2.0 5.1

then

Z

 

 

0.2

3.8

 

 

 

=

 

3.1

9.2

 

 

 

4.8

5.1

 

 

 

 

6.1

2.0

 

 

 

 

 

 

 

 

Some authors used a superscript T instead.

A.1.2.3 Multiplication

Matrix and vector multiplication using the ‘dot’ product is denoted by the symbol ‘.’ between matrices. It is only possible to multiply two matrices together if the number of columns of the first matrix equals the number of rows of the second matrix. The number of rows of the product will equal the number of rows of the first matrix, and the number of columns equal the number of columns of the second matrix. Hence a 3 × 2 matrix when multiplied by a 2 × 4 matrix will give a 3 × 4 matrix.

Multiplication of matrices is not commutative, that is, generally A.B =B.A even if the second product is allowable. Matrix multiplication can be expressed in the form of summations. For arrays with more than two dimensions (e.g. tensors), conventional symbolism can be awkward and it is probably easier to think in terms of summations.

If matrix A has dimensions I × J and matrix B has dimensions J × K, then the product C of dimensions I × K has elements defined by

J

 

 

 

 

 

 

 

cik=

aij bj k

 

 

 

 

 

 

 

 

 

 

 

 

 

j =1

 

 

 

 

 

 

 

Hence

9

3

 

 

0

1

8

5

 

 

54

93

123

42

 

 

 

 

 

1

7

 

·

 

10

11

 

=

 

6

17

67

38

 

 

2

5

 

 

12

25

62

31

 

 

 

 

 

 

6

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

APPENDICES

411

 

 

To illustrate this, the element of the first row and second column of the product is given by 17 = 1 × 10 + 7 × 1.

When several matrices are multiplied together it is normal to take any two neighbouring matrices, multiply them together and then multiply this product with another neighbouring matrix. It does not matter in what order this is done, hence A.B.C = (A.B).C = A.(B.C ). Hence matrix multiplication is associative. Matrix multiplication is also distributive, that is, A.(B + C ) = A.B + A.C .

A.1.2.4 Inverse

Most square matrices have inverses, defined by the matrix which when multiplied with the original matrix gives the identity matrix, and is represented by a 1 as a right- hand-side superscript, so that D.D1 = I . Note that some square matrices do not have inverses: this is caused by there being correlations in the original matrix; such matrices are called singular matrices.

A.1.2.5 Pseudo-inverse

In several sections of this text we use the idea of a pseudo-inverse. If matrices are not square, it is not possible to calculate an inverse, but the concept of a pseudo-inverse

exists and is employed in regression analysis.

If A = B.C then B .A = B .B.C , so (B .B)1.B .A = C and (B .B)1.B is said

to be the left pseudo-inverse of B.

Equivalently, A.C = B.C .C , so A.C .(C .C )1 = B and C .(C .C )1 is said to be the right pseudo-inverse of C.

In regression, the equation A B.C is an approximation; for example, A may represent a series of spectra that are approximately equal to the product of two matrices such as scores and loadings matrices, hence this approach is important to obtain the best fit model for C knowing A and B or for B knowing A and C.

A.1.2.6 Trace and Determinant

Other properties of square matrices sometimes encountered are the trace, which is the sum of the diagonal elements, and the determinant, which relates to the size of the matrix. A determinant of 0 indicates a matrix without an inverse. A very small determinant often suggests that the data are fairly correlated or a poor experimental design resulting in fairly unreliable predictions. If the dimensions of matrices are large and the magnitudes of the measurements are small, e.g. 103, it is sometimes possible to obtain a determinant close to zero even though the matrix has an inverse; a solution to this problem is to multiply each measurement by a number such as 103 and then remember to readjust the magnitude of the numbers in resultant calculations to take account of this later.

A.1.2.7 Vector length

An interesting property that chemometricians sometimes use is that the product of the transpose of a column vector with itself equals the sum of square of elements of the vector, so that x .x = x2. The length of a vector is given by (x .x) = x2 or

412

CHEMOMETRICS

 

 

the square root of the sum of its elements. This can be visualised in geometry as the length of the line from the origin to the point in space indicated by the vector.

A.2 Algorithms

There are many different descriptions of the various algorithms in the literature. This Appendix describes one algorithm for each of four regression methods.

A.2.1 Principal components analysis

NIPALS is a common, iterative algorithm often used for PCA. Some authors use another method called SVD (singular value decomposition). The main difference is that NIPALS extracts components one at a time, and can be stopped after the desired number of PCs has been obtained. In the case of large datasets with, for example, 200 variables (e.g. in spectroscopy), this can be very useful and reduce the amount of effort required. The steps are as follows.

Initialisation

1.Take a matrix Z and, if required, preprocess (e.g. mean centre or standardise) to give the matrix X which is used for PCA.

New Principal Component

2.Take a column of this matrix (often the column with greatest sum of squares) as the first guess of the scores first principal component; call it initial tˆ.

Iteration for each Principal Component

3. Calculate

unnorm pˆ = initial tˆ .X

tˆ2

4. Normalise the guess of the loadings, so

unnorm pˆ

pˆ =

unnorm pˆ2

5. Now calculate a new guess of the scores:

new tˆ = X.pˆ

Check for Convergence

6. Check if this new guess differs from the first guess; a simple approach is to

look at the size of the sum of square difference in the old and new scores, i.e.(initial tˆ − new tˆ)2. If this is small the PC has been extracted, set the PC scores

(t) and loadings (p) for the current PC to tˆ and pˆ . Otherwise, return to step 3, substituting the initial scores by the new scores.

Соседние файлы в предмете Химия