Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

INDEX

483

 

 

half factorial designs

62–5

Hamming window

133

Hanning window

133

and convolution

141, 142

hard modelling

233, 243

hat matrix

47

 

 

hat notation

30, 128, 192

heteroscedastic noise

129

heuristic evolving latent projections (HELP) 376

homoscedastic noise

128, 129

 

 

identity matrix

409

 

 

 

 

 

Matlab command for

461

 

 

independent modelling of classes

243, 244, 266

see also SIMCA method

 

 

 

independent test sets

317–23

 

 

industrial process control

233

 

 

time series in

 

120

 

 

 

 

 

innovation, in Kalman filters

164

 

 

instrumentation error

128

 

 

 

instrumentation noise

128

 

 

 

interaction of factors

16, 31

 

 

interaction terms, in design matrix

32, 53

Internet resources

11–12

 

 

 

inverse calibration

279–80

 

 

 

compared with classical calibration

279–80,

280, 281

 

 

 

 

 

 

 

inverse Fourier transforms

151

 

 

inverse of matrix

 

411

 

 

 

 

in Excel

432, 432

 

 

 

 

 

K nearest neighbour (KNN) method

249–51

limitations

251

 

 

 

 

 

methodology

249–51

 

 

 

 

problem(s) on

 

257, 259–60

 

 

Kalman filters 122, 163–7 applicability 165, 167 calculation of 164–5

Kowalski, B. R. 9, 456 Krilov space 2

lack-of-fit 20

lack-of-fit sum-of-square error 27–8

leverage

47–53

 

 

calculation of

47, 48

 

definition

47

 

 

effects

53

 

 

 

equation form

49–50

 

graphical representation

51, 51, 53

properties

49

 

 

line graphs

 

 

 

Excel facility

447

 

Matlab facility

469–71

 

linear discriminant analysis

233, 237–40

problem(s) on

264–5

 

linear discriminant function

237

calculation of

239, 240

 

linear filters

120, 131–42

 

calculation of 133–4

 

 

convolution

138, 141

 

 

derivatives

138

 

 

 

 

smoothing functions

131–7

 

linear regression, Excel add-in for

436, 437

loadings (in PCA)

190, 192–5

 

loadings plots

207–9

 

 

 

after mean centring

214

 

 

after ranking of data

363

 

after row scaling

218, 3535

 

after standardisation

190, 216, 357, 361

of raw data

2089, 212, 344

 

superimposed on scores plots 219–20

three-dimensional plots

348, 349

Matlab facility

475, 477

 

Lorentzian peakshapes

123–4

 

compared with Gaussian

124

 

in NMR spectroscopy

148

 

time domain equivalent

149

 

magnetic resonance imaging (MRI)

121

magnitude spectrum, in Fourier transforms 153

Mahalanobis distance measure

227, 236–41

problem(s) on

261–3

 

 

 

Manhattan distance measure

226

 

matched filters

160

 

 

 

 

 

Matlab 7–8, 456–78

 

 

 

 

advantages

7–8, 456

 

 

 

basic arithmetic matrix operations

461–2

comments in

467

 

 

 

 

compared with Excel

8, 446

 

 

conceptual problem (not looking at raw

numerical data)

8

 

 

 

data preprocessing

464–5

 

 

directories

457–8

 

 

 

 

figure command

469

 

 

 

file types

458–9

 

 

 

 

 

diary files

459

 

 

 

 

 

m files

458–9, 468

 

 

 

mat files

458, 466

 

 

 

function files

468

 

 

 

 

graphics facility

469–78

 

 

 

creating figures

469

 

 

 

labelling of datapoints

471–3

 

line graphs

469–71

 

 

 

multiple plot facility 469, 471

 

three-dimensional graphics

473–8

two-variable plot

471

 

 

 

handling matrices/scalars/vectors

460–1

help facility

456, 470

 

 

 

loops

467

 

 

 

 

 

 

 

matrix functions

462–4

 

 

 

numerical data 466

 

 

 

 

plot command

469, 471

 

 

 

principal components analysis

465–6

starting

457

 

 

 

 

 

 

subplot command

469

 

 

 

484

INDEX

 

 

Matlab (continued)

 

 

 

 

user interface

8, 457

 

 

 

view command

474

 

 

 

matrices

 

 

 

 

 

 

 

addition of

410

 

 

 

 

definitions

 

409

 

 

 

 

dimensions

409

 

 

 

 

inverses

411

 

 

 

 

 

in Excel

432, 432

 

 

 

multiplication of

410–11

 

 

in Excel

431, 432

 

 

 

notation

32, 409

 

 

 

 

singular

411

 

 

 

 

 

subtraction of

410

 

 

 

transposing of

410

 

 

 

in Excel

431, 432

 

 

 

see also design matrices

 

 

matrix operations

410–11

 

 

in Excel

431–3

 

 

 

 

in Matlab

 

461–4

 

 

 

maximum entropy (maxent) techniques

121, 168,

169–73

 

 

 

 

 

 

 

problem(s) on

176–7

 

 

 

mean, meaning of term

417–18

 

 

mean centring

 

 

 

 

 

 

data scaling by

212–13, 283, 308, 356

in Matlab

464–5

 

 

 

loadings and scores plots after

214

 

mean square error

 

28

 

 

 

measurement noise

 

 

 

 

correlated noise

 

129–31

 

 

stationary noise

 

128–9

 

 

median smoothing

 

134–7

 

 

medical tomography

121

 

 

mixture

 

 

 

 

 

 

 

meaning of term

 

 

 

 

to chemists

84

 

 

 

to statisticians

84

 

 

 

mixture designs

84–96

 

 

 

constrained mixture designs 90–6

 

problem(s) on

110–11, 113

 

 

problem(s) on

103–4, 110–11, 113, 114–15,

116–17

 

 

 

 

 

simplex centroid designs 85–8

 

 

problem(s) on

110–11, 114–15, 116–17

simplex lattice designs

88–90

 

 

with process variables

96

 

 

mixture space

85

 

 

 

 

model validation, for calibration methods

313–23

modified simplex, optimisation using 100–1

moving average filters 131–2

 

 

calculation of

133–4

 

 

 

and convolution

 

141, 142

 

 

problem(s) on

173–4

 

 

 

tutorial article on

11

 

 

 

moving average noise distribution

130

 

multilevel partial factorial design

 

 

construction of 72–6

 

parameters for

76

 

cyclic permuter for

73, 76

difference vector for

73, 76

repeater for 73, 76

 

multimode data analysis

4, 309

multiple linear regression (MLR) 284–92

compared with principal components regression

392

 

 

disadvantage 292

 

Excel add-in for

7, 455–6

multidetector advantage

284

multivariate approaches

288–92

multiwavelength equations 284–8

and partial least squares

248

resolution using

388–90

 

problem(s) on

401, 403–4

multiplication of matrix 410–11

in Excel 431, 432

 

multivariate analysis, Excel add-in for 449,

451–6

 

 

 

multivariate calibration

271, 288–92

experimental design for

69–76

problem(s) on

324–7, 328–32, 334–8

reading recommendations

10

uses 272–3

 

 

 

multivariate correlograms

146–7

problem(s) on

177–8

 

multivariate curve resolution, reading

recommendations

10

 

multivariate data matrices

188–90

multivariate models, in discriminant analysis 234–6

multivariate patterns, comparing 219–23 multiwavelength equations, multiple linear

regression 284–8

multiway partial least squares, unfolding approach

307–9

 

 

multiway pattern recognition 251–5

 

PARAFAC models

253–4

 

Tucker3 models 252–3

 

unfolding approach

254–5

 

multiway PLS methods

307–13

 

mutually orthogonal factorial designs

72

NATO Advanced Study School (1983)

9

near-infrared (NIR) spectroscopy 1, 237, 271

nearest neighbour clustering

228

example

229

 

 

NIPALS

194, 412, 449, 465

NMR spectroscopy

 

 

digitisation of data

125–6

Fourier transforms used

120–1, 147

free induction decay

148

frequency domains

148

 

time domains 147–8

INDEX

485

 

 

noise 128–31 correlated 129–31

signal-to-noise ratio 131 stationary 128–9

nonlinear deconvolution methods 121, 173

normal distribution

419–21

 

Excel function for

435

 

 

and Gaussian peakshape

123

inverse, Excel function for

435

probability density function

419

standardised

420

 

 

 

 

normal probability plots

43–4

calculations

44–5

 

 

 

significance testing using

43–5

problem(s) on

104–5

 

 

normalisation

346

 

 

 

 

notation, vectors and matrices

32, 409

Nyquist frequency

155

 

 

 

optimal filters

160

 

 

 

 

optimisation

 

 

 

 

 

chemometrics used in

3, 15, 16, 97

see also simplex optimisation

organic chemists, interests

 

3, 5

orthogonality

 

 

 

 

 

in central composite designs

80–1, 83

in factorial designs 55, 56, 67

outliers

 

 

 

 

 

detection of

233

 

 

 

 

meaning of term

21, 235

 

overlapping classes

243, 244

 

PARAFAC models

253–4

 

 

parameters, sign affected by coding of data

38

 

 

 

 

partial least squares (PLS)

297–313

algorithms

413–17

 

and autopredictive errors

314–15

cross-validation in 316

 

problem(s) on

333–4

 

Excel add-in for

7, 454–5

and multiple linear regression 248

multiway

307–13

 

PLS1 approach

298–303

algorithm

413–14

 

Excel implementation

454, 455

principles

299

 

problem(s) on

332–4

 

PLS2 approach

303–6

 

algorithm

414–15

 

Excel implementation

455

principles

305

 

problem(s) on

323–4, 332–4

trilinear PLS1

309–13

 

algorithm

416–17

 

tutorial article on

11

 

uses 298

see also discriminant partial least squares partial selectivity 392–6

pattern recognition 183–269 multiway 251–5 problem(s) on 255–69

reading recommendations 10 supervised 184, 230–51 unsupervised 183–4, 224–30

see also cluster analysis; discriminant analysis; factor analysis; principal components analysis

PCA see principal components analysis

peakshapes

122–5

 

 

 

 

asymmetrical

124, 125

 

 

 

in cluster of peaks

125, 126

 

 

embedded

366, 367, 371

 

 

 

fronting

124, 125

 

 

 

 

Gaussian

123, 366

 

 

 

information used

 

 

 

 

in curve fitting

124

 

 

 

in simulations 124–5

 

 

 

Lorentzian 123–4

 

 

 

 

parameters characterising

122–3

 

tailing 124, 125, 366, 367

 

 

 

phase errors, in Fourier transforms

153, 154

pigment analysis

284

 

 

 

Plackett–Burman (factorial) designs

67–9

generators for

68

 

 

 

 

problem(s) on

109–10

 

 

 

PLS1

298–303, 413–14

 

 

 

see also partial least squares

 

 

PLS2

303–6, 414–15

 

 

 

see also partial least squares

 

 

pooled variance–covariance matrix

237

population covariance

419

 

 

 

Excel function for calculating

435

population standard deviation

418

 

Excel function for calculating

434

population variance

418

 

 

 

Excel function for calculating

434

predicted residual error sum of squares (PRESS) errors 200

calculation of 201, 203 Excel implementation 452

preprocessing of data 210–18, 350–60 see also data preprocessing

principal component based plots 342–50 problem(s) on 398, 401, 404

principal components (PCs)

graphical representation of 205–10, 344–50 sign 8

principal components analysis (PCA) 184–223 aims 190–1

algorithms 412–13

applied to raw data 210–11 case studies 186, 18790

486

INDEX

 

 

principal components analysis (PCA) (continued)

chemical factors

191–2

 

 

 

compared with factor analysis

185, 204

comparison of multivariate patterns

219–23

cross-validation in

199–204

 

 

Excel implementation

 

452

 

 

data preprocessing for

210–18

 

Excel add-in for

7, 447, 449, 451–2

as form of variable reduction

194–5

history

185

 

 

 

 

 

 

 

Matlab implementation

465–6

 

method

191–223

 

 

 

 

 

multivariate data matrices

188–90

 

problem(s) on

111–13, 255–6, 263–4,

265–7

 

 

 

 

 

 

 

 

rank and eigenvalues

195–204

 

scores and loadings

192–5

 

 

graphical representation

205–10, 348, 349,

 

473–8

 

 

 

 

 

 

 

in SIMCA

244–5

 

 

 

 

 

tutorial article on

11

 

 

 

 

see also loadings plots; scores plots

 

principal components regression (PCR)

292–7

compared with multiple linear regression

392

 

 

 

 

 

 

 

 

cross-validation in

315–16

 

 

Excel implementation

 

454

 

 

Excel add-in for

7, 453–4

 

 

problem(s) on

327–8

 

 

 

 

quality of prediction

 

 

 

 

 

modelling the c (or y) block

295

 

modelling the x block

296–7

 

regression

292–5

 

 

 

 

 

resolution using

 

390–1

 

 

 

problem(s) on

401, 403–4

 

 

problems

 

 

 

 

 

 

 

 

 

on calibration

323–38

 

 

 

 

on experimental design

102–17

 

on pattern recognition

 

255–69

 

on signal processing

173–81

 

 

procrustes analysis

220–3

 

 

 

reflection (transformation) in

221

 

rotation (transformation) in 221

 

scaling/stretching (transformation) in

221

translation (transformation) in

221

 

uses 223

 

 

 

 

 

 

 

 

property relationships, testing of

17–18

pseudo-components, in constrained mixture designs 91

pseudo-inverse 33, 276, 292, 411

quadratic discriminant function 242 quality control, Taguchi’s method 69

quantitative modelling, chemometrics used in 15–16

quantitative structure–analysis relationships (QSARs) 84, 188, 273

quantitative structure–property relationships

(QSPRs)

15, 188, 273

 

 

 

 

quarter factorial designs

65–6

 

 

 

random number generator, in Excel

437, 438

rank of matrix

195

 

 

 

 

 

 

 

ranking of variables

358–60, 362

 

 

reading recommendations

8–11

 

 

 

regression coefficients, calculating

34

 

regularised quadratic discriminant function

242

replicate sum of squares

 

26, 29

 

 

 

replication

20–1

 

 

 

 

 

 

 

 

in central composite design

77

 

 

reroughing

120, 137

 

 

 

 

 

 

 

residual sum of squares

196

 

 

 

 

residual sum of squares (RSS) errors

26, 200

calculation of

 

201, 203

 

 

 

 

Excel implementation

452

 

 

 

resolution

386–98

 

 

 

 

 

 

 

aims

386–7

 

 

 

 

 

 

 

 

 

and constraints

396, 398

 

 

 

 

partial selectivity

392–6

 

 

 

 

problem(s) on

401–7

 

 

 

 

 

selectivity for all components

387–91

 

using multiple linear regression

388–90

using principal components regression

 

390–1

 

 

 

 

 

 

 

 

using pure spectra and selective variables

 

387–8

 

 

 

 

 

 

 

 

response, meaning of term

19

 

 

 

response surface designs

 

76–84

 

 

 

see also central composite designs

 

 

root mean square error(s)

28

 

 

 

of calibration

 

313–14

 

 

 

 

 

in partial least squares

 

302, 303, 304, 321, 322

in principal components regression

295, 296,

297

 

 

 

 

 

 

 

 

 

 

rotatability, in central composite designs

80,

81–3

 

 

 

 

 

 

 

 

 

 

rotation

204, 205, 292

 

 

 

 

 

 

see also factor analysis

 

 

 

 

 

row scaling

 

 

 

 

 

 

 

 

 

 

data preprocessing by

 

215–17, 350–5

 

loadings and scores plots after

218, 3535

scaling to a base peak

 

354–5

 

 

 

selective summation to a constant total

354

row vector

409

 

 

 

 

 

 

 

 

running median smoothing (RMS)

120, 134–7

sample standard deviation

418

 

 

 

Excel function for calculating

434

 

saturated factorial designs

56

 

 

 

Savitsky–Golay derivatives

138, 141, 381

problem(s) on

179–80

 

 

 

 

 

Savitsky–Golay filters

120, 133

 

 

 

calculation of

 

133–4

 

 

 

 

 

INDEX

487

 

 

and convolution 141, 142

problem(s) on

173–4

scalar, meaning of term

409

scalar operations

 

 

 

in Excel

430–1

 

 

in Matlab

460

 

 

 

scaling

210–18, 350–60

column

356–60

 

 

row

215–17, 350–5

 

to base peaks

354–5

 

see also column scaling; data preprocessing;

 

mean centring; row scaling; standardisation

scores (in PCA)

190, 192–5

normalisation of

346

 

scores plots

205–6

 

 

after mean centring 214

after normalisation

350, 351, 352

after ranking of data

363

after row scaling

218, 353–5

after standardisation

190, 216, 357, 361

problem(s) on

258–9

for procrustes analysis

221, 224

of raw data 2067, 212, 344

superimposed on loadings plots 219–20 three-dimensional plots 348, 349

Matlab facility 469, 476–7

screening experiments, chemometrics used in 15, 16–17, 231

sequential processes 131 sequential signals 119–22 Sheffe´ models 87

sign of parameters, and coding of data 38–9

sign of principal components 8

 

signal processing

 

119–81

 

basics

 

 

 

 

 

 

digitisation

125–8

 

 

noise 128–31

 

 

 

peakshapes

 

122–5

 

 

sequential processes

131

 

Bayes’ theorem

169

 

 

correlograms

142–7

 

 

auto-correlograms 142–5

 

cross-correlograms 145–6

 

multivariate correlograms

146–7

Fourier transform techniques

147–63

convolution theorem

161–3

Fourier filters

156–61

 

Fourier transforms

 

147–56

Kalman filters

 

163–7

 

 

linear filters

131–41

 

 

convolution

 

138, 141

 

derivatives

138

 

 

 

smoothing functions

131–7

maximum entropy techniques

169–73, 1618

modelling

172–3

 

 

 

time series analysis

142–7

 

wavelet transforms

167–8

 

signal-to-noise (S/N) ratio

131

 

 

significance testing

36–47

 

 

 

coding of data

37–9

 

 

 

 

 

dummy factors

46

 

 

 

 

 

F-test 42–3

 

 

 

 

 

 

 

 

limitations of statistical tests

46–7

normal probability plots

43–5

 

problem(s) on

104–5

 

 

 

 

size of coefficients

39–40

 

 

 

Student’s t -test

40–2

 

 

 

 

significant figures, effects

8

 

 

 

SIMCA method

243–8

 

 

 

 

 

methodology

244–8

 

 

 

 

 

class distance

245

 

 

 

 

 

discriminatory power calculated

247–8

modelling power calculated

245–6, 247

principal components analysis

244–5

principles

243–4

 

 

 

 

 

 

problem(s) on

260–1

 

 

 

 

validation for

248

 

 

 

 

 

 

similarity measures

 

 

 

 

 

 

 

in cluster analysis

 

224–7

 

 

 

composition determined by

372–6

correlation coefficient

 

225

 

 

 

Euclidean distance

225–6

 

 

 

Mahalanobis distance

227, 236–41

Manhattan distance

226

 

 

 

simplex

85

 

 

 

 

 

 

 

 

 

simplex centroid designs

 

85–8

 

 

design

85–6

 

 

 

 

 

 

 

 

design matrix for

 

87, 88

 

 

 

model

86–7

 

 

 

 

 

 

 

 

multifactor designs

88

 

 

 

 

problem(s) on

110–11, 114–15, 116–17

simplex lattice designs

88–90

 

 

simplex optimisation

 

97–102

 

 

checking for convergence

99

 

 

elaborations

99

 

 

 

 

 

 

 

fixed sized simplex

97–9

 

 

 

k + 1 rule

99

 

 

 

 

 

 

 

 

limitations

101–2

 

 

 

 

 

 

modified simplex

 

100–1

 

 

 

problem(s) on

107–8

 

 

 

 

stopping rules for

 

99

 

 

 

 

 

simulation, peakshape information used 124–5 singular matrices 411

singular value decomposition (SVD) method 194, 412

in Matlab 465–6 smoothing methods

MA compared with RMS filters 135–7 moving averages 131–2

problem(s) on 177 reroughing 137

running median smoothing 134–7 Savitsky–Golay filters 120, 133 wavelet transforms 168

488

INDEX

 

 

soft independent modelling of class analogy

(SIMCA) method

243–8

 

 

see also SIMCA method

 

 

 

soft modelling

243, 244

 

 

 

software 6–8

 

 

 

 

 

 

see also Excel; Matlab

 

 

 

sparse data matrix

360, 364

 

 

 

spectra, signal processing for

120, 122

 

square matrix

409

 

 

 

 

 

determinant of 411

 

 

 

 

inverse of

411

 

 

 

 

 

Excel function for calculating 432

 

trace of

411

 

 

 

 

 

standard deviation

418

 

 

 

Excel function for calculating

434

 

standardisation

 

 

 

 

 

 

data preprocessing using

213–15, 309, 356

loadings and scores plots after

190, 216, 357,

361

 

 

 

 

 

 

 

standardised normal distribution

420

 

star design, in central composite design

77

stationary noise

128–9

 

 

 

statistical distance

237

 

 

 

see also Mahalanobis distance

 

 

statistical methods

 

 

 

 

 

Internet resources

11–12

 

 

 

reading recommendations

10–11

 

statistical significance tests, limitations

46–7

statisticians, interests

1–2, 5–6

 

 

Student’s t -test

40–2

 

 

 

 

see also t -distribution

supermodified simplex, optimisation using 101

supervised pattern recognition

184, 230–51

compared with cluster analysis

230

cross-validation and testing for

231–2, 248

discriminant analysis

233–42

 

discriminant partial least squares method

248–9

 

 

 

 

general principles

231–3

 

 

applying the model

233

 

 

cross-validation

232

 

 

improving the data

232–3

 

modelling the training set

231

test sets 231–2

 

 

 

KNN method 249–51

 

 

SIMCA method

243–8

 

 

tdistribution 425 two-tailed 424

see also Student’s t -test

Taguchi (factorial) designs

69

taste panels 219, 252

 

terminology

 

for calibration 273, 275

 

for experimental design

275

vectors and matrices 409

test sets 70, 231–2 independent 317–23

tilde notation 128

time-saving advantages of chemometrics

15

time domains, in NMR spectroscopy 147

–8

time series

 

 

 

 

 

example

143

 

 

lag in 144

 

 

 

 

time series analysis 142–7

 

reading recommendations

11

trace (of square matrix)

411

 

training sets

 

70, 184, 231, 317

transformation

 

204, 205, 292

see also factor analysis

 

transposing of matrix

410

 

in Excel

431, 432

 

 

tree diagrams

229–30

 

 

trilinear PLS1

 

309–13

 

algorithm

 

416–17

 

 

calculation of components

312

compared with bilinear PLS1 311

matricisation

311–12

 

representation

310

 

 

Tucker3 (multiway pattern recognition) models

252–3

 

 

unfolding approach

 

 

in multiway partial least squares

307–9

in multiway pattern recognition

254–5

univariate calibration

276–84

 

classical calibration

276–9

 

inverse calibration

279–80

 

problem(s) on 324, 326–7

univariate classification, in discriminant analysis 233–4

unsupervised pattern recognition 183–4, 224–30

compared with exploratory data analysis 184 see also cluster analysis

UV/vis spectroscopy 272 problem(s) on 328–32

validation

in supervised pattern recognition 232, 248 see also cross-validation

variable selection

360–5

methods 364–5

 

optimum size for

364

problem(s) on 401

variance

 

meaning of term

20, 418

see also analysis of variance (ANOVA)

variance–covariance matrix 419 VBA see Visual Basic for Applications vector length 411–12

INDEX

489

 

 

vectors

 

 

 

 

addition of

410

 

 

definitions

409

 

 

handling in Matlab

460

 

multiplication of 410

 

notation 409

 

 

subtraction of

410

 

 

Visual Basic for Applications (VBA)

7, 437,

445–7

 

 

 

 

comments in

 

445

 

 

creating and editing Excel macros

440–5

editor screens

439, 443

 

functions in

 

445

 

 

loops 445–6

 

 

matrix operations in

446–7

 

subroutines

 

445

 

 

Index compiled by Paul Nash

wavelet transforms 4, 121, 167–8 principal uses

data compression 168 smoothing 168

websites 11–12

weights vectors 316, 334

window factor analysis (WFA) 376, 378–80 problem(s) on 400

windows

in smoothing of time series data 119, 132 see also Hamming window; Hanning window

Wold, Herman 119 Wold, S. 243, 271, 456

zero concentration window 393

Соседние файлы в предмете Химия