Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Brereton Chemometrics

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

4.3 Mб

Скачать

☆

<<< < Предыдущая 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 4950 / 5050

INDEX	483

half factorial designs			62–5
Hamming window			133
Hanning window		133
and convolution			141, 142
hard modelling		233, 243
hat matrix	47
hat notation	30, 128, 192
heteroscedastic noise			129

heuristic evolving latent projections (HELP) 376


homoscedastic noise				128, 129
identity matrix		409
Matlab command for					461
independent modelling of classes							243, 244, 266
see also SIMCA method
independent test sets				317–23
industrial process control					233
time series in			120
innovation, in Kalman ﬁlters						164
instrumentation error				128
instrumentation noise				128
interaction of factors				16, 31
interaction terms, in design matrix							32, 53
Internet resources			11–12
inverse calibration			279–80
compared with classical calibration								279–80,
280, 281
inverse Fourier transforms						151
inverse of matrix			411
in Excel	432, 432
K nearest neighbour (KNN) method								249–51
limitations	251
methodology		249–51
problem(s) on			257, 259–60

Kalman ﬁlters 122, 163–7 applicability 165, 167 calculation of 164–5

Kowalski, B. R. 9, 456 Krilov space 2

lack-of-ﬁt 20

lack-of-ﬁt sum-of-square error 27–8

leverage	47–53
calculation of			47, 48
deﬁnition		47
effects	53
equation form			49–50
graphical representation				51, 51, 53
properties		49
line graphs
Excel facility			447
Matlab facility			469–71
linear discriminant analysis				233, 237–40
problem(s) on			264–5
linear discriminant function				237
calculation of			239, 240
linear ﬁlters		120, 131–42

calculation of 133–4
convolution	138, 141
derivatives	138
smoothing functions			131–7
linear regression, Excel add-in for					436, 437
loadings (in PCA)		190, 192–5
loadings plots	207–9
after mean centring			214
after ranking of data			363
after row scaling		218, 353–5
after standardisation			190, 216, 357, 361
of raw data	208–9, 212, 344
superimposed on scores plots 219–20
three-dimensional plots				348, 349
Matlab facility		475, 477
Lorentzian peakshapes			123–4
compared with Gaussian				124
in NMR spectroscopy			148
time domain equivalent				149
magnetic resonance imaging (MRI)					121

magnitude spectrum, in Fourier transforms 153

Mahalanobis distance measure							227, 236–41
problem(s) on			261–3
Manhattan distance measure						226
matched ﬁlters			160
Matlab 7–8, 456–78
advantages		7–8, 456
basic arithmetic matrix operations								461–2
comments in			467
compared with Excel					8, 446
conceptual problem (not looking at raw
numerical data)					8
data preprocessing				464–5
directories		457–8
ﬁgure command				469
ﬁle types		458–9
diary ﬁles			459
m ﬁles		458–9, 468
mat ﬁles		458, 466
function ﬁles			468
graphics facility				469–78
creating ﬁgures				469
labelling of datapoints						471–3
line graphs			469–71
multiple plot facility 469, 471
three-dimensional graphics							473–8
two-variable plot					471
handling matrices/scalars/vectors								460–1
help facility			456, 470
loops	467
matrix functions				462–4
numerical data 466
plot command			469, 471
principal components analysis							465–6
starting	457
subplot command				469

484	INDEX

Matlab (continued)
user interface			8, 457
view command			474
matrices
addition of		410
deﬁnitions		409
dimensions		409
inverses	411
in Excel		432, 432
multiplication of				410–11
in Excel		431, 432
notation	32, 409
singular	411
subtraction of			410
transposing of			410
in Excel		431, 432
see also design matrices
matrix operations			410–11
in Excel	431–3
in Matlab		461–4
maximum entropy (maxent) techniques							121, 168,
169–73
problem(s) on			176–7
mean, meaning of term					417–18
mean centring
data scaling by			212–13, 283, 308, 356
in Matlab		464–5
loadings and scores plots after						214
mean square error				28
measurement noise
correlated noise				129–31
stationary noise				128–9
median smoothing				134–7
medical tomography				121
mixture
meaning of term
to chemists			84
to statisticians				84
mixture designs			84–96
constrained mixture designs 90–6
problem(s) on				110–11, 113
problem(s) on			103–4, 110–11, 113, 114–15,
116–17
simplex centroid designs 85–8
problem(s) on				110–11, 114–15, 116–17
simplex lattice designs					88–90
with process variables					96
mixture space		85
model validation, for calibration methods							313–23
modiﬁed simplex, optimisation using 100–1
moving average ﬁlters 131–2
calculation of			133–4
and convolution				141, 142
problem(s) on			173–4
tutorial article on				11
moving average noise distribution						130
multilevel partial factorial design

construction of 72–6
parameters for	76
cyclic permuter for	73, 76
difference vector for		73, 76
repeater for 73, 76
multimode data analysis		4, 309
multiple linear regression (MLR) 284–92

compared with principal components regression

392
disadvantage 292
Excel add-in for	7, 455–6
multidetector advantage		284
multivariate approaches		288–92
multiwavelength equations 284–8
and partial least squares		248
resolution using	388–90
problem(s) on	401, 403–4
multiplication of matrix 410–11
in Excel 431, 432

multivariate analysis, Excel add-in for 449,

451–6
multivariate calibration		271, 288–92
experimental design for			69–76
problem(s) on	324–7, 328–32, 334–8
reading recommendations			10
uses 272–3
multivariate correlograms			146–7
problem(s) on	177–8
multivariate curve resolution, reading
recommendations		10
multivariate data matrices			188–90

multivariate models, in discriminant analysis 234–6

multivariate patterns, comparing 219–23 multiwavelength equations, multiple linear

regression 284–8

multiway partial least squares, unfolding approach

307–9
multiway pattern recognition 251–5
PARAFAC models	253–4
Tucker3 models 252–3
unfolding approach	254–5
multiway PLS methods	307–13
mutually orthogonal factorial designs		72
NATO Advanced Study School (1983)		9

near-infrared (NIR) spectroscopy 1, 237, 271

nearest neighbour clustering			228
example	229
NIPALS	194, 412, 449, 465
NMR spectroscopy
digitisation of data		125–6
Fourier transforms used			120–1, 147
free induction decay		148
frequency domains		148

time domains 147–8

INDEX	485

noise 128–31 correlated 129–31

signal-to-noise ratio 131 stationary 128–9

nonlinear deconvolution methods 121, 173

normal distribution		419–21
Excel function for		435
and Gaussian peakshape				123
inverse, Excel function for					435
probability density function					419
standardised	420
normal probability plots			43–4
calculations	44–5
signiﬁcance testing using				43–5
problem(s) on		104–5
normalisation	346
notation, vectors and matrices					32, 409
Nyquist frequency		155
optimal ﬁlters	160
optimisation
chemometrics used in			3, 15, 16, 97
see also simplex optimisation
organic chemists, interests				3, 5
orthogonality
in central composite designs					80–1, 83
in factorial designs 55, 56, 67
outliers
detection of	233
meaning of term		21, 235
overlapping classes		243, 244
PARAFAC models		253–4

parameters, sign affected by coding of data

38
partial least squares (PLS)				297–313
algorithms	413–17
and autopredictive errors				314–15
cross-validation in 316
problem(s) on			333–4
Excel add-in for			7, 454–5
and multiple linear regression 248
multiway	307–13
PLS1 approach			298–303
algorithm		413–14
Excel implementation				454, 455
principles		299
problem(s) on			332–4
PLS2 approach			303–6
algorithm		414–15
Excel implementation				455
principles		305
problem(s) on		323–4, 332–4
trilinear PLS1		309–13
algorithm		416–17
tutorial article on			11

uses 298

see also discriminant partial least squares partial selectivity 392–6

pattern recognition 183–269 multiway 251–5 problem(s) on 255–69

reading recommendations 10 supervised 184, 230–51 unsupervised 183–4, 224–30

see also cluster analysis; discriminant analysis; factor analysis; principal components analysis

PCA see principal components analysis


peakshapes		122–5
asymmetrical			124, 125
in cluster of peaks				125, 126
embedded		366, 367, 371
fronting		124, 125
Gaussian		123, 366
information used
in curve ﬁtting				124
in simulations 124–5
Lorentzian 123–4
parameters characterising					122–3
tailing 124, 125, 366, 367
phase errors, in Fourier transforms							153, 154
pigment analysis			284
Plackett–Burman (factorial) designs							67–9
generators for			68
problem(s) on			109–10
PLS1	298–303, 413–14
see also partial least squares
PLS2	303–6, 414–15
see also partial least squares
pooled variance–covariance matrix							237
population covariance				419
Excel function for calculating						435
population standard deviation					418
Excel function for calculating						434
population variance				418
Excel function for calculating						434

predicted residual error sum of squares (PRESS) errors 200

calculation of 201, 203 Excel implementation 452

preprocessing of data 210–18, 350–60 see also data preprocessing

principal component based plots 342–50 problem(s) on 398, 401, 404

principal components (PCs)

graphical representation of 205–10, 344–50 sign 8

principal components analysis (PCA) 184–223 aims 190–1

algorithms 412–13

applied to raw data 210–11 case studies 186, 187–90

486	INDEX

principal components analysis (PCA) (continued)

chemical factors				191–2
compared with factor analysis								185, 204
comparison of multivariate patterns									219–23
cross-validation in				199–204
Excel implementation							452
data preprocessing for						210–18
Excel add-in for				7, 447, 449, 451–2
as form of variable reduction								194–5
history	185
Matlab implementation						465–6
method	191–223
multivariate data matrices							188–90
problem(s) on			111–13, 255–6, 263–4,
265–7
rank and eigenvalues					195–204
scores and loadings					192–5
graphical representation							205–10, 348, 349,
	473–8
in SIMCA		244–5
tutorial article on				11
see also loadings plots; scores plots
principal components regression (PCR)									292–7
compared with multiple linear regression
392
cross-validation in				315–16
Excel implementation							454
Excel add-in for				7, 453–4
problem(s) on			327–8
quality of prediction
modelling the c (or y) block								295
modelling the x block							296–7
regression		292–5
resolution using				390–1
problem(s) on				401, 403–4
problems
on calibration			323–38
on experimental design						102–17
on pattern recognition						255–69
on signal processing					173–81
procrustes analysis				220–3
reﬂection (transformation) in								221
rotation (transformation) in 221
scaling/stretching (transformation) in									221
translation (transformation) in								221
uses 223
property relationships, testing of								17–18

pseudo-components, in constrained mixture designs 91

pseudo-inverse 33, 276, 292, 411

quadratic discriminant function 242 quality control, Taguchi’s method 69

quantitative modelling, chemometrics used in 15–16

quantitative structure–analysis relationships (QSARs) 84, 188, 273

quantitative structure–property relationships


(QSPRs)			15, 188, 273
quarter factorial designs						65–6
random number generator, in Excel										437, 438
rank of matrix			195
ranking of variables					358–60, 362
reading recommendations							8–11
regression coefﬁcients, calculating									34
regularised quadratic discriminant function											242
replicate sum of squares							26, 29
replication		20–1
in central composite design								77
reroughing		120, 137
residual sum of squares						196
residual sum of squares (RSS) errors										26, 200
calculation of				201, 203
Excel implementation							452
resolution		386–98
aims	386–7
and constraints				396, 398
partial selectivity					392–6
problem(s) on				401–7
selectivity for all components									387–91
using multiple linear regression										388–90
using principal components regression
	390–1
using pure spectra and selective variables
	387–8
response, meaning of term							19
response surface designs							76–84
see also central composite designs
root mean square error(s)							28
of calibration				313–14
in partial least squares							302, 303, 304, 321, 322
in principal components regression										295, 296,
297
rotatability, in central composite designs											80,
81–3
rotation	204, 205, 292
see also factor analysis
row scaling
data preprocessing by							215–17, 350–5
loadings and scores plots after									218, 353–5
scaling to a base peak							354–5
selective summation to a constant total											354
row vector		409
running median smoothing (RMS)									120, 134–7
sample standard deviation							418
Excel function for calculating									434
saturated factorial designs							56
Savitsky–Golay derivatives								138, 141, 381
problem(s) on				179–80
Savitsky–Golay ﬁlters						120, 133
calculation of				133–4

INDEX	487


and convolution 141, 142
problem(s) on			173–4
scalar, meaning of term					409
scalar operations
in Excel		430–1
in Matlab		460
scaling	210–18, 350–60
column		356–60
row	215–17, 350–5
to base peaks			354–5
see also column scaling; data preprocessing;
	mean centring; row scaling; standardisation
scores (in PCA)			190, 192–5
normalisation of				346
scores plots		205–6
after mean centring 214
after normalisation				350, 351, 352
after ranking of data					363
after row scaling				218, 353–5
after standardisation					190, 216, 357, 361
problem(s) on			258–9
for procrustes analysis					221, 224
of raw data 206–7, 212, 344

superimposed on loadings plots 219–20 three-dimensional plots 348, 349

Matlab facility 469, 476–7

screening experiments, chemometrics used in 15, 16–17, 231

sequential processes 131 sequential signals 119–22 Sheffe´ models 87

sign of parameters, and coding of data 38–9

sign of principal components 8
signal processing			119–81
basics
digitisation		125–8
noise 128–31
peakshapes			122–5
sequential processes					131
Bayes’ theorem			169
correlograms		142–7
auto-correlograms 142–5
cross-correlograms 145–6
multivariate correlograms						146–7
Fourier transform techniques						147–63
convolution theorem					161–3
Fourier ﬁlters			156–61
Fourier transforms					147–56
Kalman ﬁlters			163–7
linear ﬁlters	131–41
convolution			138, 141
derivatives		138
smoothing functions					131–7
maximum entropy techniques						169–73, 1618
modelling	172–3
time series analysis				142–7
wavelet transforms				167–8

signal-to-noise (S/N) ratio							131
signiﬁcance testing				36–47
coding of data			37–9
dummy factors			46
F-test 42–3
limitations of statistical tests									46–7
normal probability plots							43–5
problem(s) on			104–5
size of coefﬁcients					39–40
Student’s t -test			40–2
signiﬁcant ﬁgures, effects							8
SIMCA method			243–8
methodology			244–8
class distance				245
discriminatory power calculated										247–8
modelling power calculated									245–6, 247
principal components analysis										244–5
principles		243–4
problem(s) on			260–1
validation for			248
similarity measures
in cluster analysis					224–7
composition determined by								372–6
correlation coefﬁcient							225
Euclidean distance					225–6
Mahalanobis distance						227, 236–41
Manhattan distance					226
simplex	85
simplex centroid designs							85–8
design	85–6
design matrix for					87, 88
model	86–7
multifactor designs					88
problem(s) on			110–11, 114–15, 116–17
simplex lattice designs						88–90
simplex optimisation					97–102
checking for convergence								99
elaborations		99
ﬁxed sized simplex					97–9
k + 1 rule		99
limitations		101–2
modiﬁed simplex					100–1
problem(s) on			107–8
stopping rules for					99

simulation, peakshape information used 124–5 singular matrices 411

singular value decomposition (SVD) method 194, 412

in Matlab 465–6 smoothing methods

MA compared with RMS ﬁlters 135–7 moving averages 131–2

problem(s) on 177 reroughing 137

running median smoothing 134–7 Savitsky–Golay ﬁlters 120, 133 wavelet transforms 168

488	INDEX

soft independent modelling of class analogy

(SIMCA) method				243–8
see also SIMCA method
soft modelling		243, 244
software 6–8
see also Excel; Matlab
sparse data matrix			360, 364
spectra, signal processing for					120, 122
square matrix		409
determinant of 411
inverse of	411
Excel function for calculating 432
trace of	411
standard deviation			418
Excel function for calculating						434
standardisation
data preprocessing using					213–15, 309, 356
loadings and scores plots after						190, 216, 357,
361
standardised normal distribution						420
star design, in central composite design							77
stationary noise		128–9
statistical distance			237
see also Mahalanobis distance
statistical methods
Internet resources			11–12
reading recommendations					10–11
statistical signiﬁcance tests, limitations							46–7
statisticians, interests				1–2, 5–6
Student’s t -test		40–2

supervised pattern recognition			184, 230–51
compared with cluster analysis				230
cross-validation and testing for				231–2, 248
discriminant analysis		233–42
discriminant partial least squares method
248–9
general principles	231–3
applying the model		233
cross-validation	232
improving the data		232–3
modelling the training set			231
test sets 231–2
KNN method 249–51
SIMCA method	243–8

Taguchi (factorial) designs	69
taste panels 219, 252
terminology
for calibration 273, 275
for experimental design	275

variable selection	360–5
methods 364–5
optimum size for	364
problem(s) on 401
variance
meaning of term	20, 418
see also analysis of variance (ANOVA)

vectors
addition of	410
deﬁnitions	409
handling in Matlab			460
multiplication of 410
notation 409
subtraction of		410
Visual Basic for Applications (VBA)				7, 437,
445–7
comments in		445
creating and editing Excel macros				440–5
editor screens		439, 443
functions in		445
loops 445–6
matrix operations in			446–7
subroutines		445

time-saving advantages of chemometrics	15
time domains, in NMR spectroscopy 147	–8

time series
example	143
lag in 144
time series analysis 142–7
reading recommendations					11
trace (of square matrix)				411
training sets		70, 184, 231, 317
transformation			204, 205, 292
see also factor analysis
transposing of matrix				410
in Excel	431, 432
tree diagrams		229–30
trilinear PLS1			309–13
algorithm		416–17
calculation of components					312
compared with bilinear PLS1 311
matricisation			311–12
representation			310

252–3
unfolding approach
in multiway partial least squares		307–9
in multiway pattern recognition		254–5
univariate calibration	276–84
classical calibration	276–9
inverse calibration	279–80

INDEX	489