Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

APPENDICES

473

 

 

Figure A.46

Scatterplot in Matlab

identical number of elements. There are various ways of telling Matlab that a variable is a string (or character) rather than numeric variable. Any data surrounding by single quotes is treated as a string, so the array c = [‘a’; ‘b’; ‘c’] will be treated by Matlab as a 3 × 1 character array. Figure A.47 illustrates the use of this method. Note that in order to prevent the labels from overlapping with the points in the graph, leaving one or two spaces before the actual text helps. It is possible to move the labels later in the graph editor if there is still some overlap.

Sometimes the labels are originally in a numerical format, for example they may consist of points in time or wavelengths. For Matlab to recognise this, the numbers can be converted to strings using the num2str function. An example is given in Figure A.48, where the first column of the matrix consists of the numbers 10, 15 and 20 which may represent times, the aim being to plot the second against the third column and use the first for labelling. Of course, any array can contain the labels.

A.5.7.5 Three-dimensional Graphics

Matlab can be very useful for the representation of data in three dimensions, in contrast to Excel where there are no straightforward 3D functions. In Chapter 6 we used 3D scores and loadings plots.

Consider a scores matrix of dimensions 36 × 3 (T) and a loadings matrix of dimensions 3 × 25 (P). The command plot3(T(:,1),T(:,2),T(:,3)) produces a graph of all three columns against one another; see Figure A.49. Often the default

474

CHEMOMETRICS

 

 

Figure A.47

Use of text command in Matlab

orientation is not the most informative for our purposes, and we may wish to change this. There are a huge number of commands in Matlab to do this, which is a big bonus for the enthusiast, but for the first time user the easiest is to select the right-hand rotation icon, and interactively change the view; see Figure 4.50. If that is the desired view, leave go of the icon.

Often we want to return to the view, and a way of keeping the same perspective is via the view command. Typing A = view will keep this information in a 4 × 4 matrix A. Enthusiasts will be able to interpret these in fundamental terms, but it is

APPENDICES

475

 

 

Figure A.48

Using numerical to character conversion for labelling of graphs

not necessary to understand this when first using 3D graphics in Matlab. However, in chemometrics we often wish to look simultaneously at 3D scores and loadings plots and it is important that both have identical orientations. The way to do this is to ensure that the loadings have the same orientation as the scores. The commands

figure(2)

plot3(P(:,1),P(:,2),P(:,3))

view(A)

should place a loadings plot with the same orientation in Figure 2. Sometimes this does not always work the first time; the reasons are rather complicated and depend on

476

CHEMOMETRICS

 

 

Figure A.49

A 3D scores plot

Figure A.50

Using the rotation icon

APPENDICES

477

 

 

Figure A.51

Scores and loadings plots with identical orientations

478

CHEMOMETRICS

 

 

the overall starting orientation, but it is usually easy to see when it has succeeded. If you are in a mess, start again from scratch. Scores and loadings plots with the same orientation are presented in Figure A.51.

The experienced user can improve these graphs just as the 2D graphs, for example by labelling axes or individual points, using symbols in addition to or as an alternative to joining using a line. The scatter3 statement has similar properties to plot3.

Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

Richard G. Brereton

Copyright 2003 John Wiley & Sons, Ltd.

ISBNs: 0-471-48977-8 (HB); 0-471-48978-6 (PB)

Index

Note: Figures and tables are indicated by italic page numbers

agglomerative clustering

227

Alchemist (e-zine) 11

 

algorithms

 

partial least squares

413–17

principal components

analysis 412–13

analogue-to-digital converter (ADC), and digital

resolution

128

 

analysis of variance (ANOVA)

24–30

with F -test

42

 

analytical chemists, interests

2–3, 5

analytical error

21

 

application scientists, interest in chemometrics 3, 4–5

auto-correlograms 142–5

automation, resolution needed due to 387 autoprediction error 200, 313–15 autoregressive moving average (ARMA) noise

129–31

autoregressive component 130 moving average component 130

autoscaling 356

average linkage clustering 228

backward expanding factor analysis 376 base peaks, scaling to 354–5

baseline correction 341, 342 Bayesian classification functions 242 Bayesian statistics 4, 169

biplots 219–20

C programming language, use of programs in Excel 446

calibration 271–338 case study 273, 2745 history 271

and model validation 313–23 multivariate 271

problems on 323–38 terminology 273, 275 univariate 276–84 usage 271–3

calibration designs 69–76 problem(s) on 113–14 uses 76

canonical variates analysis 233

Cauchy distribution, and Lorentzian peakshape 123

central composite designs

76–84

 

axial (or star) points in

77, 80–3

 

degrees of freedom for

79–80

 

 

and modelling

83

 

 

 

 

 

orthogonality

80–1, 83

 

 

 

problem(s) on

106–7, 115–16

 

 

rotatability

80, 81–3

 

 

 

 

setting up of

76–8

 

 

 

 

 

and statistical factors

84

 

 

 

centring, data scaling by

212–13

 

chemical engineers, interests

2, 6

 

chemical factors, in PCA

191–2

 

 

chemists, interests 2

 

 

 

 

 

chemometricians, characteristics

5

 

chemometrics

 

 

 

 

 

 

 

 

people interested in

1, 4–6

 

 

reading recommendations

8–9

 

 

relationship to other disciplines

3

 

Chemometrics and Intelligent Laboratory Systems

(journal)

9

 

 

 

 

 

 

Chemometrics World (Internet resource) 11

chromatography

 

 

 

 

 

 

 

digitisation of data

 

126

 

 

 

principal components analysis applications

column performance

186, 189, 190

resolution of overlapping peaks

186, 187,

188

 

 

 

 

 

 

 

 

signal processing for

 

120, 122

 

 

class distance plots

235–6, 239, 241

 

class distances

 

237, 239

 

 

 

in SIMCA

245

 

 

 

 

 

 

class modelling

243–8

 

 

 

 

problem(s) on

265–6

 

 

 

 

classical calibration

276–9

 

 

 

compared with inverse calibration

279–80,

280, 281

 

 

 

 

 

 

 

classification

 

 

 

 

 

 

 

 

chemist’s need for

 

230

 

 

 

see also supervised pattern recognition

closure, in row scaling

215

 

 

 

cluster analysis

183, 224–30

 

 

compared with supervised pattern recognition

230

 

 

 

 

 

 

 

 

graphical representation of results

 

229–30

 

 

 

 

 

 

 

linkage methods

227–8

 

 

 

next steps

229

 

 

 

 

 

 

480

INDEX

 

 

cluster analysis (continued)

 

 

 

 

problem(s) on

256–7

 

 

 

 

 

similarity measures

224–7

 

 

 

coding of data, in significance testing

37–9

coefficients of model

19

 

 

 

 

 

determining

33–4, 55

 

 

 

 

 

column scaling, data preprocessing by

356–60

column vector

409

 

 

 

 

 

 

composition

 

 

 

 

 

 

 

 

determining

365–86

 

 

 

 

 

by correlation based methods

372–5

by derivatives

380–6

 

 

 

 

by eigenvalue based methods

376–80

by similarity based methods

372–6

by univariate methods 367–71

 

meaning of term 365–7

 

 

 

 

compositional mixture experiments

84

 

constrained mixture designs

90–6

 

 

lower bounds specified

90–1, 91

 

problem(s) on

110–11

 

 

 

 

upper bounds specified

91–3, 91

 

upper and lower bounds specified

91, 93

with additional factor added as filler 91, 93

constraints

 

 

 

 

 

 

 

 

experimental design affected by

90–6

and resolution

396, 398

 

 

 

 

convolution 119, 138, 141, 162–3

 

 

convolution theorem

161–3

 

 

 

 

Cooley–Tukey algorithm

147

 

 

 

correlated noise 129–31

 

 

 

 

 

correlation coefficient(s)

419

 

 

 

in cluster analysis

225

 

 

 

 

composition determined by

372–5

 

problem(s) on

398, 404

 

 

 

 

in design matrix

56

 

 

 

 

 

Excel function for calculating

434

 

correlograms

119, 142–7

 

 

 

 

auto-correlograms 142–5

 

 

 

 

cross-correlograms 145–6

 

 

 

 

multivariate correlograms

146–7

 

problem(s) on

175–6, 177–8

 

 

 

coupled chromatography

 

 

 

 

 

amount of data generated

339

 

 

matrix representation of data

188, 189

principal components based plots

342–50

scaling of data

350–60

 

 

 

 

variable selection for

360–5

 

 

 

covariance, meaning of term

418–19

 

Cox models

87

 

 

 

 

 

 

 

cross-citation analysis

1

 

 

 

 

 

cross-correlograms 145–6

 

 

 

 

problem(s) on

175–6

 

 

 

 

 

cross-validation

 

 

 

 

 

 

 

limitations

317

 

 

 

 

 

 

in partial least squares

316–17

 

 

problem(s) on

333–4

 

 

 

 

in principal components analysis

199–204

Excel implementation 452

 

problem(s) on 267, 269

 

in principal components regression

315–16

purposes 316–17

 

in supervised pattern recognition

232, 248

cumulative standardised normal distribution 420,

421

 

 

 

 

 

 

 

 

data compression, by wavelet transforms

168

data preprocessing/scaling

210–18

 

by column scaling

 

356–60

 

by mean centring

212–13, 283, 307, 309, 356

by row scaling

215–17, 350–5

 

by standardisation

 

213–15, 309, 356

 

in Excel

453

 

 

 

 

 

 

in Matlab

464–5

 

 

 

 

 

datasets

342

 

 

 

 

 

 

degrees of freedom

 

 

 

 

 

basic principles

19–23

 

 

in central composite design

79–80

 

dendrograms

184, 229–30

 

 

derivatives

138

 

 

 

 

 

 

composition determined by

380–6

 

problem(s) on

 

398, 401, 403–4

 

of Gaussian curve

 

139

 

 

for overlapping peaks

138, 140

 

problem(s) on

179–80

 

 

Savitsky–Golay method for calculating

138,

141

 

 

 

 

 

 

 

descriptive statistics

417–19

 

 

correlation coefficient

419

 

 

covariance

418–19

 

 

 

mean

417–18

 

 

 

 

 

 

standard deviation

 

418

 

 

variance

418

 

 

 

 

 

 

design matrices and modelling

30–6

 

coding of data

37–9

 

 

 

determining the model

33–5

 

for factorial designs

55

 

 

matrices

31–3

 

 

 

 

 

 

models

30–1

 

 

 

 

 

 

predictions 35–6

 

 

 

 

 

problem(s) on

102

 

 

 

 

determinant (of square matrix)

411

 

digital signal processing (DSP), reading

 

recommendations

11

 

 

digitisation of data

 

125–8

 

 

effect on digital resolution

126–8

 

problem(s) on

178–9

 

 

 

discrete Fourier transform (DFT) 147

 

and sampling rates

 

154–5

 

 

discriminant analysis

 

233–42

 

extension of method

242

 

 

and Mahalanobis distance

236–41

 

multivariate models

234–6

 

univariate classification

233–4

 

INDEX

481

 

 

discriminant partial least squares (DPLS) method 248–9

distance measures 225–7 problem(s) on 257, 261–3

see also Euclidean...; Mahalanobis...; Manhattan distance measure

dot product 410

double exponential (Fourier) filters 158, 160–1 dummy factors 46, 68

eigenvalue based methods, composition determined

by

376–80

 

 

 

 

 

 

 

eigenvalues

196–9

 

 

 

 

 

 

eigenvectors

193

 

 

 

 

 

 

electronic absorption spectroscopy (EAS)

 

calibration for

272, 284

 

 

 

case study

273, 2745

 

 

 

experimental design

 

19–23

 

 

see also UV/vis spectroscopy

 

 

embedded peaks

366, 367, 371

 

 

determining profiles of

395

 

 

entropy

 

 

 

 

 

 

 

 

definition

171

 

 

 

 

 

 

see also maximum entropy techniques

 

environmental processes, time series data

119

error, meaning of term

20

 

 

 

error analysis

23–30

 

 

 

 

 

 

problem(s) on

108–9

 

 

 

 

Euclidean distance measure

225–6, 237

 

problem(s) on

257, 261–3

 

 

evolutionary signals

339–407

 

 

problem(s) on

398–407

 

 

 

evolving factor analysis (EFA) 376–8

 

problem(s) on

400

 

 

 

 

 

Excel

7, 425–56

 

 

 

 

 

 

add-ins 7, 436–7

 

 

 

 

 

 

for linear regression

436, 437

 

for multiple linear regression

7, 455–6

for multivariate analysis

7, 449, 451–6

for partial least squares

7, 454–5

 

for principal components analysis 7, 451–2

for principal components regression

7,

 

453–4

 

 

 

 

 

 

 

systems requirements

7, 449

 

 

arithmetic functions of ranges and matrices

 

433–4

 

 

 

 

 

 

 

 

arithmetic functions of scalars

433

 

AVERAGE function

428

 

 

 

cell addresses

 

 

 

 

 

 

 

alphanumeric format

425

 

 

invariant

425

 

 

 

 

 

 

numeric format

426–7

 

 

 

chart facility

447, 448, 449, 450

 

labelling of datapoints

 

447

 

 

compared with Matlab

8, 446

 

 

copying cells or ranges

428, 42930

 

CORREL function

 

434

 

 

 

 

equations and functions

430–6

FDIST function

42, 435

 

file referencing

427

 

 

graphs produced by 447, 448, 449, 450

logical functions

435

 

 

macros

 

 

 

creating and editing

440–5

downloadable

7, 447–56

running 437–40

 

 

matrix operations

431–3

MINVERSE function

432, 432

MMULT function

431, 432

TRANSPOSE function

431, 432

names and addresses

425–30

naming matrices or vectors 430, 431

nesting and combining functions and equations

435–6

 

 

 

 

 

 

 

NORMDIST function

435

 

 

NORMINV function

45, 435

 

ranges of cells 427–8

 

 

 

scalar operations

 

430–1

 

 

 

statistical functions

435

 

 

 

STDEV/STDEVP functions

434

 

TDIST function

42, 435

 

 

 

VAR/VARP functions

434

 

 

Visual Basic for Applications (VBA)

7, 437,

445–7

 

 

 

 

 

 

 

worksheets

 

 

 

 

 

 

 

maximum size

 

426

 

 

 

 

naming 427

 

 

 

 

 

 

experimental design

 

15–117

 

 

basic principles

19–53

 

 

 

analysis of variance

23–30

 

degrees of freedom

19–23

 

design matrices and modelling 30–6

leverage and confidence in models

47–53

significance testing

36–47

 

central composite/response surface designs

76–84

 

 

 

 

 

 

 

factorial designs

53–76

 

 

 

fractional factorial designs

60–6

 

full factorial designs

54–60

 

partial factorials at several levels

69–76

Plackett–Burman designs

67–9

 

Taguchi designs

69

 

 

 

introduction

15–19

 

 

 

 

mixture designs

84–96

 

 

 

constrained mixture designs 90–6

simplex centroid designs

85–8

 

simplex lattice designs

88–90

 

with process variables

96

 

 

problems on

102–17

 

 

 

 

calibration designs

113–14

 

central composite designs

106–7, 115–16

design matrix

102

 

 

 

 

factorial designs

102–3, 105–6, 113–14

mixture designs

103–4, 110–11, 113,

114–15, 116–17

 

 

 

482

INDEX

 

 

experimental design (continued)

principal components analysis 111–13

significance testing

104–5

simplex optimisation

107–8

reading recommendations

10

simplex optimisation

97–102

elaborations

99

 

 

fixed sized simplex

97–9

limitations

101–2

 

 

modified simplex

100–1

terminology

275

 

 

experimental error 21–2

 

estimating 22–3, 77

 

 

exploratory data analysis (EDA) 183

baseline correction 341, 342

compared with unsupervised pattern recognition 184

data preprocessing/scaling for 350–60

principal component based plots

342–50

variable selection

360–5

 

 

see also factor analysis; principal components

 

analysis

 

 

 

 

 

 

exponential (Fourier) filters

156, 157

double

158, 160–1

 

 

 

 

F distribution

 

421–4

 

 

 

 

one-tailed

422–3

 

 

 

 

F-ratio

 

30, 42, 43

 

 

 

 

 

F-test

42–3, 421

 

 

 

 

 

with ANOVA

42

 

 

 

 

face centred cube design

77

 

 

factor, meaning of term

 

19

 

 

factor analysis (FA)

183, 204–5

 

compared with PCA

 

185, 204

 

see also evolving factor analysis; PARAFAC

 

models; window factor analysis

factorial designs

53–76

 

 

 

four-level

60

 

 

 

 

 

fractional

60–6

 

 

 

 

 

examples of construction

64–6

matrix of effects

63–4

 

 

problem(s) on

102–3

 

 

full

54–60

 

 

 

 

 

 

problem(s) on

105–6

 

 

Plackett–Burman designs

67–9

 

problem(s) on

109–10

 

 

problems on

102–3, 105–6, 109–10

Taguchi designs

69

 

 

 

 

three-level

60

 

 

 

 

 

two-level 54–9

 

 

 

 

 

design matrices for

55, 62

 

disadvantages

59, 60

 

 

and normal probability plots

43

problem(s) on

102, 102–3, 105–6

reduction of number of experiments 61–3

uses

76

 

 

 

 

 

 

 

two-level fractional

61–6

 

 

disadvantages

66

 

 

 

 

half factorial designs

62–5

 

 

 

quarter factorial designs

65–6

 

 

fast Fourier transform (FFT)

156

 

 

 

filler, in constrained mixture design

93

 

Fisher, R. A. 36, 237

 

 

 

 

 

 

 

Fisher discriminant analysis

233

 

 

 

fixed sized simplex, optimisation using

97–9

fixed sized window factor analysis

376, 378–80

flow injection analysis (FIA), problem(s) on

328

forgery, detection of

 

184, 211, 237, 251

 

forward expanding factor analysis

376

 

Fourier deconvolution

121, 156–61

 

 

Fourier filters 156–61

 

 

 

 

 

 

 

exponential filters

 

156, 157

 

 

 

 

influence of noise

 

157–61

 

 

 

 

Fourier pair

149

 

 

 

 

 

 

 

 

 

 

Fourier self-deconvolution

121, 161

 

 

Fourier transform algorithms

156

 

 

 

Fourier transform techniques

147–63

 

 

convolution theorem

161–3

 

 

 

 

Fourier filters

156–61

 

 

 

 

 

 

Fourier transforms

 

147–56

 

 

 

 

problem(s) on

174–5, 180–1

 

 

 

Fourier transforms

120–1, 147–56

 

 

forward

150–1

 

 

 

 

 

 

 

 

 

general principles

 

147–50

 

 

 

 

inverse

151, 161

 

 

 

 

 

 

 

 

 

methods

150–2

 

 

 

 

 

 

 

 

 

numerical example

151–2

 

 

 

 

reading recommendations

11

 

 

 

real and imaginary pairs

 

152–4

 

 

 

absorption lineshape

 

152, 153

 

 

dispersion lineshape

 

152, 153

 

 

and sampling rates

154–6

 

 

 

 

fractional factorial designs

 

60–6

 

 

 

in central composite designs

77

 

 

 

problem(s) on

102–3

 

 

 

 

 

 

 

freedom, degrees of see degrees of freedom

 

frequency domains, in NMR spectroscopy

148

full factorial designs

 

54–60

 

 

 

 

in central composite designs

77

 

 

 

problem(s) on

105–6

 

 

 

 

 

 

 

furthest neighbour clustering

228

 

 

 

gain vector

164

 

 

 

 

 

 

 

 

 

 

Gallois field theory

2

 

 

 

 

 

 

 

Gaussians

123

 

 

 

 

 

 

 

 

 

 

compared with Lorentzians

124

 

 

 

derivatives of

139

 

 

 

 

 

 

 

 

in frequency and time domains

149

 

 

generators (in factorial designs)

67

 

 

geological processes, time series data

119

 

graphical representation

 

 

 

 

 

 

 

cluster analysis results

 

229–30

 

 

 

Excel facility

447, 448, 450

 

 

 

 

Matlab facility

469–78

 

 

 

 

 

 

principal components

205–10

 

 

 

Соседние файлы в предмете Химия