Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

180

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

Wavelength

A

B

C

 

(nm)

 

 

 

 

 

 

 

 

 

 

308

0.606

0.551

0.480

 

312

0.477

0.461

0.433

 

316

0.342

0.359

0.372

 

320

0.207

0.248

0.295

 

324

0.113

0.161

0.226

 

328

0.072

0.107

0.170

 

332

0.058

0.070

0.122

 

336

0.053

0.044

0.082

 

340

0.051

0.026

0.056

 

344

0.051

0.016

0.041

 

348

0.051

0.010

0.033

 

 

 

 

 

 

 

1.Produce and superimpose the graphs of the raw spectra. Comment.

2.Calculate the five point Savitsky–Golay quadratic first and second derivatives of A. Plot the graphs, and interpret them; compare both first and second derivatives and discuss the appearance in terms of the number and positions of the peaks.

3.Repeat this for spectrum C. Why is the pattern more complex? Interpret the graphs.

4.Calculate the five point Savitsky–Golay quadratic second derivatives of all three spectra and superimpose the resultant graphs. Repeat for the seven point derivatives. Which graph is clearer, five or seven point derivatives? Interpret the results for spectrum B. Do the derivatives show it is clearly a mixture? Comment on the appearance of the region between 270 and 310 nm, and compare with the original spectra.

Problem 3.9 Fourier Analysis of NMR Signals

Section 3.5.1.4 Section 3.5.1.2 Section 3.5.1.3

The data below consists of 72 sequential readings in time (organised in columns for clarity), which represent a raw time series (or FID) acquired over a region of an NMR spectrum. The first column represents the first 20 points in time, the second points 21 to 40, and so on.

2732.61

35.90

1546.37

267.40

14083.58

845.21

213.23

121.18

7571.03

1171.34

1203.41

11.60

5041.98

148.79

267.88

230.14

5042.45

2326.34

521.55

171.80

2189.62

611.59

45.08

648.30

1318.62

2884.74

249.54

258.94

96.36

2828.83

1027.97

264.47

2120.29

598.94

39.75

92.67

409.82

1010.06

1068.85

199.36

3007.13

2165.89

160.62

330.19

5042.53

1827.65

872.29

991.12

3438.08

786.26

382.11

 

SIGNAL PROCESSING

 

181

 

 

 

2854.03

2026.73

150.49

9292.98

132.10

460.37

6550.05

932.92

256.68

3218.65

305.54

989.48

7492.84

394.40

159.55

1839.61

616.13

1373.90

2210.89

306.17

725.96

1.The data were acquired at intervals of 0.008124 s. What is the spectral width of the Fourier transform, taking into account that only half the points are represented in the transform? What is the digital resolution of the transform?

2.Plot a graph of the original data, converting the horizontal axis to seconds.

3.In a simple form, the real transform can be expressed by

M1

RL(n) = f (m) cos(nm/M)

m=0

Define the parameters in the equation in terms of the dataset discussed in this problem. What is the equivalent equation for the imaginary transform?

4.Perform the real and imaginary transforms on this data (note you may have to write a small program to do this, but it can be laid out in a spreadsheet without a program). Notice that n and m should start at 0 rather than 1, and if angles are calculated in radians it is necessary to include a factor of 2π . Plot the real and imaginary transforms using a scale of hertz for the horizontal axis.

5.Comment on the phasing of the transform and produce a graph of the absolute value spectrum.

6.Phasing involves finding an angle ψ such that

ABS = cos(ψ)RL + sin(ψ)IM

A first approximation is that this angle is constant throughout a spectrum. By looking at the phase of the imaginary transform, obtained in question 4, can you produce a first guess of this angle? Produce the result of phasing using this angle and comment.

7. How might you overcome the remaining problem of phasing?

Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

Richard G. Brereton

Copyright 2003 John Wiley & Sons, Ltd.

ISBNs: 0-471-48977-8 (HB); 0-471-48978-6 (PB)

4 Pattern Recognition

4.1 Introduction

One of the first and most publicised success stories in chemometrics is pattern recognition. Much chemistry involves using data to determine patterns. For example, can infrared spectra be used to classify compounds into ketones and esters? Is there a pattern in the spectra allowing physical information to be related to chemical knowledge? There have been many spectacular successes of chemical pattern recognition. Can a spectrum be used in forensic science, for example to determine the cause of a fire? Can a chromatogram be used to decide on the origin of a wine and, if so, what main features in the chromatogram distinguish different wines? And is it possible to determine the time of year the vine was grown? Is it possible to use measurements of heavy metals to discover the source of pollution in a river?

There are several groups of methods for chemical pattern recognition.

4.1.1 Exploratory Data Analysis

Exploratory data analysis (EDA) consists mainly of the techniques of principal components analysis (PCA) and factor analysis (FA). The statistical origins are in biology and psychology. Psychometricians have for many years had the need to translate numbers such as answers to questions in tests into relationships between individuals. How can verbal ability, numeracy and the ability to think in three dimensions be predicted from a test? Can different people be grouped by these abilities? And does this grouping reflect the backgrounds of the people taking the test? Are there differences according to educational background, age, sex or even linguistic group?

In chemistry, we too need to ask similar questions, but the raw data are often chromatographic or spectroscopic. An example is animal pheromones: animals recognise each other more by smell than by sight, and different animals often lay scent trails, sometimes in their urine. The chromatogram of a urine sample may containing several hundred compounds, and it is often not obvious to the untrained observer which are the most significant. Sometimes the most potent compounds are present in only minute quantities. Yet animals can often detect through scent marking whether there is one of the opposite sex in-heat looking for a mate, or whether there is a dangerous intruder entering his or her territory. Exploratory data analysis of chromatograms of urine samples can highlight differences in chromatograms of different social groups or different sexes, and give a simple visual idea as to the main relationships between these samples. Sections 4.2 and 4.3 cover these approaches.

4.1.2 Unsupervised Pattern Recognition

A more formal method of treating samples is unsupervised pattern recognition, mainly consisting of cluster analysis. Many methods have their origins in numerical taxonomy.

184

CHEMOMETRICS

 

 

Biologists measure features in different organisms, for example various body length parameters. Using a couple of dozen features, it is possible to see which species are most similar and draw a picture of these similarities, called a dendrogram, in which more closely related species are closer to each other. The main branches of the dendrogram can represent bigger divisions, such as subspecies, species, genera and families.

These principles can be directly applied to chemistry. It is possible to determine similarities in amino acid sequences in myoglobin in a variety of species. The more similar the species, the closer is the relationship: chemical similarity mirrors biological similarity. Sometimes the amount of information is so huge, for example in large genomic or crystallographic databases, that cluster analysis is the only practicable way of searching for similarities.

Unsupervised pattern recognition differs from exploratory data analysis in that the aim of the methods is to detect similarities, whereas using EDA there is no particular prejudice as to whether or how many groups will be found. Cluster analysis is described in more detail in Section 4.4.

4.1.3 Supervised Pattern Recognition

There are a large number of methods for supervised pattern recognition, mostly aimed at classification. Multivariate statisticians have developed many discriminant functions, some of direct relevance to chemists. A classical application is the detection of forgery of banknotes. Can physical measurements such as width and height of a series of banknotes be used to identify forgeries? Often one measurement is not enough, so several parameters are required before an adequate mathematical model is available.

So in chemistry, similar problems occur. Consider using a chemical method such as IR spectroscopy to determine whether a sample of brain tissue is cancerous or not. A method can be set up in which the spectra of two groups, cancerous and noncancerous tissues, are recorded. Then some form of mathematical model is set up. Finally, the diagnosis of an unknown sample can be predicted.

Supervised pattern recognition requires a training set of known groupings to be available in advance, and tries to answer a precise question as to the class of an unknown sample. It is, of course, always necessary first to establish whether chemical measurements are actually good enough to fit into the predetermined groups. However, spectroscopic or chromatographic methods for diagnosis are often much cheaper than expensive medical tests, and provide a valuable first diagnosis. In many cases chemical pattern recognition can be performed as a type of screening, with doubtful samples being subjected to more sophisticated tests. In areas such as industrial process control, where batches of compounds might be produced at hourly intervals, a simple on-line spectroscopic test together with chemical data analysis is often an essential first step to determine the possible acceptability of a batch.

Section 4.5 describes a variety of such techniques and their applications.

4.2 The Concept and Need for Principal Components Analysis

PCA is probably the most widespread multivariate chemometric technique, and because of the importance of multivariate measurements in chemistry, it is regarded by many as the technique that most significantly changed the chemist’s view of data analysis.

PATTERN RECOGNITION

185

 

 

4.2.1 History

There are numerous claims to the first use of PCA in the literature. Probably the most famous early paper was by Pearson in 1901. However, the fundamental ideas are based on approaches well known to physicists and mathematicians for much longer, namely those of eigenanalysis. In fact, some school mathematics syllabuses teach ideas about matrices which are relevant to modern chemistry. An early description of the method in physics was by Cauchy in 1829. It has been claimed that the earliest nonspecific reference to PCA in the chemical literature was in 1878, although the author of the paper almost certainly did not realise the potential, and was dealing mainly with a simple problem of linear calibration.

It is generally accepted that the revolution in the use of multivariate methods took place in psychometrics in the 1930s and 1940s, of which Hotelling’s work is regarded as a classic. Psychometrics is well understood by most students of psychology and one important area involves relating answers in tests to underlying factors, for example, verbal and numerical ability as illustrated in Figure 4.1. PCA relates a data matrix consisting of these answers to a number of psychological ‘factors’. In certain areas of statistics, ideas of factor analysis and PCA are intertwined, but in chemistry the two approaches have different implications: PCA involves using abstract functions of the data to look at patterns whereas FA involves obtaining information such as spectra that can be directly related to the chemistry.

Natural scientists of all disciplines, including biologists, geologists and chemists, have caught on to these approaches over the past few decades. Within the chemical community, the first major applications of PCA were reported in the 1970s, and form the foundation of many modern chemometric methods described in this chapter.

Answers to questions

Factors

 

People

 

People

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.1

Factor analysis in psychology

186

CHEMOMETRICS

 

 

4.2.2 Case Studies

In order to illustrate the main concepts of PCA, we will introduce two case studies, both from chromatography (although there are many other applications in the problems at the end of the chapter). It is not necessary to understand the detailed chemical motivations behind the chromatographic technique. The first case studies represents information sequentially related in time, and the second information where there is no such relationship but variables are on very different scales.

4.2.2.1 Case Study 1: Resolution of Overlapping Peaks

This case study involves a chromatogram obtained by high-performance liquid chromatography with diode array detection (HPLC–DAD) sampled at 30 points in time (each at 1 s intervals) and 28 wavelengths of approximately 4.8 nm intervals as presented in Table 4.1 (note that the wavelengths are rounded to the nearest nanometre for simplicity, but the original data were not collected at exact nanometre intervals). Absorbances are presented in AU (absorbance units). For readers not familiar with this application, the dataset can be considered to consist of a series of 30 spectra recorded sequentially in time, arising from a mixture of compounds each of which has its own characteristic underlying unimodal time profile (often called an ‘elution profile’).

The data can be represented by a 30 × 28 matrix, the rows corresponding to elution times and the columns wavelengths. Calling this matrix X, and each element xij , the profile chromatogram

28

Xi = xij

j =1

is given in Figure 4.2, and consists of at least two co-eluting peaks.

4.2.2.2 Case Study 2: Chromatographic Column Performance

This case study is introduced in Table 4.2. The performances of eight commercial chromatographic columns are measured. In order to do this, eight compounds are tested, and the results are denoted by a letter (P, N, A, C, Q, B, D, R). Four peak characteristics are measured, namely, k (which relates to elution time), N (relating to peak width), N(df) (another peak width parameter) and As (asymmetry). Each measurement is denoted by a mnemonic of two halves, the first referring to the compound and the second to the nature of the test, k being used for k and As for asymmetry. Hence the measurement CN refers to a peak width measurement on compound C. The matrix is transposed in Table 4.2, for ease of presentation, but is traditionally represented by an 8 × 32 matrix, each of whose rows represents a chromatographic column and whose columns represent a measurement. Again for readers not familiar with this type of case study, the aim is to ascertain the similarities between eight objects (chromatographic columns – not be confused with columns of a matrix) as measured by 32 parameters (related to the quality of the chromatography).

One aim is to determine which columns behave in a similar fashion, and another which tests measure similar properties, so to reduce the number of tests from the original 32.

PATTERN RECOGNITION

187

 

 

Table 4.1 Case study 1: a chromatogram recorded at 30 points in time and 28 wavelengths (nm).

349

0.000

0.000

0.001

0.003

0.004

0.005

0.004

0.003

0.003

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

344

0.000

0.001

0.003

0.008

0.011

0.012

0.010

0.008

0.006

0.005

0.004

0.003

0.003

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.000

0.000

0.000

0.000

339

0.000

0.002

0.008

0.019

0.029

0.031

0.026

0.021

0.016

0.012

0.009

0.007

0.006

0.005

0.004

0.004

0.003

0.003

0.002

0.002

0.002

0.002

0.002

0.001

0.001

0.001

0.001

0.001

0.001

0.001

334

0.001

0.005

0.019

0.043

0.065

0.069

0.060

0.046

0.035

0.026

0.021

0.016

0.013

0.011

0.009

0.008

0.007

0.006

0.005

0.004

0.004

0.004

0.003

0.003

0.003

0.002

0.002

0.002

0.002

0.002

329

0.001

0.009

0.035

0.081

0.120

0.129

0.111

0.087

0.066

0.051

0.040

0.032

0.026

0.021

0.018

0.015

0.013

0.011

0.010

0.009

0.008

0.007

0.006

0.005

0.005

0.004

0.004

0.004

0.003

0.003

325

0.002

0.013

0.053

0.123

0.183

0.196

0.172

0.139

0.111

0.091

0.074

0.061

0.050

0.042

0.035

0.029

0.025

0.021

0.019

0.016

0.014

0.013

0.011

0.010

0.009

0.008

0.008

0.007

0.006

0.006

320

0.002

0.017

0.067

0.157

0.234

0.255

0.234

0.209

0.191

0.174

0.154

0.132

0.111

0.093

0.077

0.065

0.055

0.047

0.041

0.035

0.031

0.027

0.024

0.021

0.019

0.017

0.016

0.014

0.013

0.012

315

0.003

0.019

0.075

0.174

0.262

0.292

0.290

0.299

0.317

0.320

0.299

0.264

0.225

0.189

0.158

0.132

0.112

0.095

0.081

0.070

0.061

0.054

0.048

0.042

0.038

0.034

0.031

0.028

0.025

0.023

310

0.003

0.019

0.075

0.173

0.262

0.301

0.325

0.378

0.441

0.469

0.450

0.402

0.345

0.290

0.242

0.203

0.171

0.145

0.124

0.107

0.093

0.082

0.072

0.064

0.057

0.051

0.046

0.042

0.038

0.035

306

0.002

0.017

0.069

0.160

0.244

0.289

0.340

0.438

0.547

0.602

0.587

0.528

0.455

0.383

0.320

0.268

0.225

0.191

0.163

0.141

0.122

0.107

0.094

0.084

0.075

0.067

0.061

0.055

0.050

0.046

301

0.002

0.017

0.066

0.155

0.237

0.289

0.362

0.497

0.643

0.719

0.707

0.638

0.550

0.463

0.387

0.324

0.272

0.231

0.197

0.170

0.147

0.129

0.114

0.101

0.090

0.081

0.073

0.066

0.060

0.055

296

0.003

0.021

0.083

0.194

0.296

0.356

0.430

0.571

0.724

0.802

0.786

0.709

0.610

0.514

0.429

0.359

0.302

0.256

0.219

0.189

0.164

0.143

0.126

0.112

0.100

0.090

0.081

0.073

0.067

0.061

291

0.004

0.030

0.120

0.279

0.422

0.490

0.546

0.662

0.793

0.855

0.826

0.741

0.636

0.535

0.447

0.374

0.315

0.267

0.229

0.197

0.171

0.150

0.132

0.118

0.105

0.094

0.085

0.077

0.070

0.064

287

0.005

0.040

0.158

0.368

0.555

0.629

0.657

0.731

0.824

0.860

0.819

0.729

0.624

0.525

0.438

0.367

0.309

0.263

0.225

0.194

0.169

0.148

0.131

0.116

0.104

0.093

0.084

0.076

0.070

0.064

282

0.006

0.046

0.182

0.422

0.635

0.710

0.713

0.744

0.798

0.809

0.760

0.672

0.573

0.481

0.402

0.337

0.284

0.241

0.207

0.179

0.156

0.137

0.121

0.107

0.096

0.086

0.078

0.071

0.064

0.059

277

0.006

0.045

0.178

0.413

0.621

0.689

0.676

0.682

0.708

0.705

0.655

0.576

0.490

0.411

0.343

0.288

0.243

0.206

0.177

0.153

0.133

0.117

0.104

0.092

0.082

0.074

0.067

0.061

0.055

0.051

272

0.005

0.038

0.153

0.354

0.532

0.588

0.571

0.565

0.576

0.568

0.524

0.460

0.391

0.327

0.273

0.229

0.193

0.165

0.141

0.122

0.107

0.094

0.083

0.074

0.066

0.059

0.054

0.049

0.044

0.041

268

0.004

0.029

0.116

0.270

0.405

0.447

0.433

0.426

0.432

0.424

0.391

0.342

0.291

0.244

0.203

0.171

0.144

0.122

0.105

0.091

0.079

0.070

0.062

0.055

0.049

0.044

0.040

0.036

0.033

0.030

263

0.003

0.021

0.084

0.194

0.291

0.321

0.310

0.304

0.308

0.301

0.277

0.243

0.206

0.173

0.144

0.121

0.102

0.087

0.074

0.064

0.056

0.049

0.044

0.039

0.035

0.031

0.028

0.026

0.023

0.021

258

0.002

0.018

0.071

0.165

0.247

0.271

0.255

0.240

0.233

0.222

0.201

0.174

0.148

0.123

0.103

0.086

0.073

0.062

0.053

0.046

0.040

0.036

0.031

0.028

0.025

0.023

0.020

0.019

0.017

0.015

253

0.003

0.023

0.090

0.209

0.312

0.338

0.305

0.262

0.230

0.202

0.174

0.147

0.123

0.102

0.086

0.072

0.061

0.052

0.045

0.039

0.034

0.030

0.027

0.024

0.021

0.019

0.018

0.016

0.015

0.013

249

0.004

0.029

0.117

0.271

0.405

0.435

0.382

0.311

0.252

0.206

0.170

0.140

0.115

0.095

0.080

0.067

0.057

0.049

0.042

0.037

0.033

0.029

0.026

0.023

0.021

0.019

0.017

0.015

0.014

0.013

244

0.004

0.030

0.120

0.280

0.419

0.449

0.391

0.311

0.243

0.193

0.155

0.126

0.103

0.085

0.071

0.060

0.051

0.044

0.038

0.033

0.029

0.026

0.023

0.021

0.019

0.017

0.015

0.014

0.013

0.012

239

0.003

0.026

0.101

0.236

0.352

0.377

0.329

0.263

0.207

0.165

0.133

0.108

0.089

0.073

0.061

0.052

0.044

0.038

0.033

0.029

0.025

0.022

0.020

0.018

0.016

0.015

0.013

0.012

0.011

0.010

234

0.003

0.021

0.085

0.198

0.296

0.323

0.299

0.270

0.251

0.232

0.206

0.178

0.150

0.125

0.105

0.088

0.075

0.064

0.055

0.048

0.042

0.037

0.032

0.029

0.026

0.023

0.021

0.019

0.017

0.016

230

0.003

0.023

0.091

0.212

0.321

0.372

0.412

0.494

0.586

0.628

0.606

0.544

0.468

0.395

0.331

0.277

0.234

0.198

0.170

0.147

0.127

0.112

0.099

0.088

0.078

0.070

0.063

0.057

0.052

0.048

225

0.004

0.029

0.115

0.267

0.405

0.477

0.553

0.699

0.854

0.928

0.902

0.815

0.704

0.596

0.500

0.419

0.354

0.300

0.257

0.222

0.193

0.169

0.149

0.132

0.118

0.106

0.096

0.087

0.079

0.072

220

0.006

0.040

0.159

0.367

0.552

0.634

0.687

0.795

0.914

0.960

0.924

0.834

0.725

0.615

0.519

0.437

0.369

0.314

0.269

0.233

0.203

0.178

0.157

0.140

0.125

0.112

0.101

0.092

0.084

0.076

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

188

 

 

 

 

 

CHEMOMETRICS

 

14

 

 

 

 

 

 

12

 

 

 

 

 

 

10

 

 

 

 

 

 

8

 

 

 

 

 

Profile

6

 

 

 

 

 

 

 

 

 

 

 

 

4

 

 

 

 

 

 

2

 

 

 

 

 

 

0

 

 

 

 

 

 

1

6

11

16

21

26

 

 

 

 

Datapoint

 

 

Figure 4.2

Case study 1: chromatographic peak profiles

4.2.3 Multivariate Data Matrices

A key idea is that most chemical measurements are inherently multivariate. This means that more than one measurement can be made on a single sample. An obvious example is spectroscopy: we can record a spectrum at hundreds of wavelength on a single sample. Many traditional chemical approaches are univariate, in which only one wavelength (or measurement) is used per sample, but this misses much information. Another important application is quantitative structure–property–activity relationships, in which many physical measurements are available on a number of candidate compounds (bond lengths, dipole moments, bond angles, etc.). Can we predict, statistically, the biological activity of a compound? Can this assist in pharmaceutical drug development? There are several pieces of information available. PCA is one of several multivariate methods that allows us to explore patterns in these data, similar to exploring patterns in psychometric data. Which compounds behave similarly? Which people belong to a similar group? How can this behaviour be predicted from available information?

As an example, consider a chromatogram in which a number of compounds are detected with different elution times, at the same time as a their spectra (such as UV or mass spectra) are recorded. Coupled chromatography, such as high-performance chromatography–diode array detection (HPLC–DAD) or liquid chromatography–mass spectrometry (LC–MS), is increasingly common in modern laboratories, and represents a rich source of multivariate data. These data can be represented as a matrix as in Figure 4.3.

What might we want to ask about the data? How many compounds are in the chromatogram would be useful information. Partially overlapping peaks and minor

PATTERN RECOGNITION

 

 

 

 

 

 

 

 

 

 

 

189

 

 

 

 

 

 

 

 

Table 4.2 Case study 2: 32 performance parameters and eight chromatographic columns.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter

Inertsil

Inertsil

Inertsil

Kromasil

Kromasil

Symmetry

Supelco

Purospher

 

 

ODS

ODS-2

ODS-3

C18

C8

 

C18

ABZ+

 

 

Pk

0.25

0.19

0.26

0.3

0.28

0.54

0.03

0.04

 

PN

10 200

6 930

7 420

2 980

2 890

4 160

6 890

6 960

 

PN(df)

2 650

2 820

2 320

293

229

944

3 660

2 780

 

PAs

2.27

2.11

2.53

5.35

6.46

3.13

1.96

2.08

 

Nk

0.25

0.12

0.24

0.22

0.21

0.45

0

0

 

NN

12 000

8 370

9 460

13 900

16 800

4 170

13 800

8 260

 

NN(df)

6 160

4 600

4 880

5 330

6 500

490

6 020

3 450

 

NAs

1.73

1.82

1.91

2.12

1.78

5.61

2.03

2.05

 

Ak

2.6

1.69

2.82

2.76

2.57

2.38

0.67

0.29

 

AN

10 700

14 400

11 200

10 200

13 800

11 300

11 700

7 160

 

AN(df)

7 790

9 770

7 150

4 380

5 910

6 380

7 000

2 880

 

AAs

1.21

1.48

1.64

2.03

2.08

1.59

1.65

2.08

 

Ck

0.89

0.47

0.95

0.82

0.71

0.87

0.19

0.07

 

CN

10 200

10 100

8 500

9 540

12 600

9 690

10 700

5 300

 

CN(df)

7 830

7 280

6 990

6 840

8 340

6 790

7 250

3 070

 

CAs

1.18

1.42

1.28

1.37

1.58

1.38

1.49

1.66

 

Qk

12.3

5.22

10.57

8.08

8.43

6.6

1.83

2.17

 

QN

8 800

13 300

10 400

10 300

11 900

9 000

7 610

2 540

 

QN(df)

7 820

11 200

7 810

7 410

8 630

5 250

5 560

941

 

QAs

1.07

1.27

1.51

1.44

1.48

1.77

1.36

2.27

 

Bk

0.79

0.46

0.8

0.77

0.74

0.87

0.18

0

 

BN

15 900

12 000

10 200

11 200

14 300

10 300

11 300

4 570

 

BN(df)

7 370

6 550

5 930

4 560

6 000

3 690

5 320

2 060

 

BAs

1.54

1.79

1.74

2.06

2.03

2.13

1.97

1.67

 

Dk

2.64

1.72

2.73

2.75

2.27

2.54

0.55

0.35

 

DN

9 280

12 100

9 810

7 070

13 100

10 000

10 500

6 630

 

DN(df)

5 030

8 960

6 660

2 270

7 800

7 060

7 130

3 990

 

DAs

1.71

1.39

1.6

2.64

1.79

1.39

1.49

1.57

 

Rk

8.62

5.02

9.1

9.25

6.67

7.9

1.8

1.45

 

RN

9 660

13 900

11 600

7 710

13 500

11 000

9 680

5 140

 

RN(df)

8 410

10 900

7 770

3 460

9 640

8 530

6 980

3 270

 

RAs

1.16

1.39

1.65

2.17

1.5

1.28

1.41

1.56

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Wavelength

 

 

 

 

 

 

 

 

Time

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.3

Matrix representation of coupled chromatographic data

impurities are the bug-bears of modern chromatography. What are the spectra of these compounds? Can we reliably determine these spectra which may be useful for library searching? Finally, what are the quantities of each component? Some of this information could undoubtedly be obtained by better chromatography, but there is a limit, especially with modern trends towards recording more and more data, more and

190

CHEMOMETRICS

 

 

more rapidly. And in many cases the identities and amounts of unknowns may not be available in advance. PCA is one tool from multivariate statistics that can help sort out these data. We will discuss the main principles in this chapter but deal with this type of application in greater depth in Chapter 6.

4.2.4 Aims of PCA

There are two principal needs in chemistry. In the case of the example of case study 1, we would like to extract information from the two way chromatogram.

The number of significant PCs is ideally equal to the number of significant components. If there are three components in the mixture, then we expect that there are only three PCs.

Each PC is characterised by two pieces of information, the scores, which, in the case of chromatography, relate to the elution profiles, and the loadings, which relate to the spectra.

Below we will look in more detail how this information is obtained. However, the ultimate information has a physical meaning to chemists.

Figure 4.4 represents the result of performing PCA (standardised as discussed in Section 4.3.6.4) on the data of case study 2. Whereas in case study 1 we can often relate PCs to chemical factors such as spectra of individual compounds, for the second example there is no obvious physical relationship and PCA is mainly employed to see the main trends in the data more clearly. One aim is to show which columns behave

PC2

 

 

6

 

 

 

 

 

 

 

Kromasil C18

5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

Symmetry C18

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

Kromasil C8

1

 

 

 

 

 

 

 

 

 

 

 

 

Purospher

 

 

Inertsil ODS–3

 

 

 

 

 

 

 

0

 

 

 

 

 

 

−4

−2

 

0

2

4

6

8

10

 

 

 

 

−1

 

 

 

 

 

 

 

Inertsil ODS

−2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Inertsil ODS-2

−3

 

Supelco ABZ+

 

 

 

 

 

 

 

 

 

 

 

 

 

−4

 

PC1

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.4

Plot of scores of PC2 versus PC1 after standardisation for case study 2

Соседние файлы в предмете Химия