Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

160

CHEMOMETRICS

 

 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Figure 3.26

Multiplying the data in Figure 3.25 by a double exponential

The time series decays more slowly than the original, but there is not much increase in noise. The peakshape in the transform is almost as narrow as that obtained using a single exponential, but noise is dramatically reduced.

A large number of so-called matched or optimal filters have been proposed in the literature, many specific to a particular kind of data, but the general principles are to obtain increased resolution without introducing too much noise. In some cases pure noise reduction filters (e.g. negative exponentials) can be applied where noise is not a serious problem. It is important to recognise that these filters can distort peakshapes. Although there is a substantial literature on this subject, the best approach is to tackle the problem experimentally rather than rely on elaborate rules. Figure 3.27 shows the result of applying a simple double exponential function to a typical time series. Note the bell-shaped function, which is usual. The original spectrum suggests a cluster of

SIGNAL PROCESSING

161

 

 

FT

FT

Figure 3.27

Use of a double exponential filter

peaks, but only two clear peaks are visible. Applying a filter function suggests that there are at least four underlying peaks in the spectrum, although there is some distortion of the data in the middle, probably a result of a function that is slightly too severe.

3.5.2.3 Fourier Self-deconvolution

In many forms of spectroscopy such as NMR and IR, data are acquired directly as a time series, and must be Fourier transformed to obtain an interpretable spectrum. However, any spectrum or chromatogram can be processed using methods described in this section, even if not acquired as a time series. The secret is to inverse transform (see Section 3.5.1) back to a time series.

Normally, three steps are employed, as illustrated in Figure 3.28:

1.transform the spectrum into a time series: this time series does not physically exist but can be handled by a computer;

2.then apply a Fourier filter to the time series;

3.finally, transform the spectrum back, resulting in improved quality.

This procedure is called Fourier self-deconvolution, and is an alternative to the digital filters in Section 3.3.

3.5.3 Convolution Theorem

Some people are confused by the difference between Fourier filters and linear smoothing and resolution functions. In fact, both methods are equivalent and are related

162

CHEMOMETRICS

 

 

Inverse transform

Filter

Forward transform

Figure 3.28

Fourier self-deconvolution of a peak cluster

by the convolution theorem, and both have similar aims, to improve the quality of spectroscopic or chromatographic or time series data.

The principles of convolution have been discussion in Section 3.3.3. Two functions, f and g, are said to be convoluted to give h if

 

j =p

hi =

 

fj gi+j

 

j =−p

Convolution involves moving a window or digital filter function (such as a Savit- sky–Golay or moving average) along a series of data such as a spectrum, multiplying the data by that function at each successive datapoint. A three point moving average

SIGNAL PROCESSING

163

 

 

involves multiplying each set of three points in a spectrum by a function containing the values (1/3, 1/3, 1/3), and the spectrum is said to be convoluted by the moving average filter function.

Filtering a time series, using Fourier time domain filters, however, involves multiplying the entire time series by a single function, so that

Hi = Fi .Gi

The convolution theorem states that f , g and h are Fourier transforms of F , G and H . Hence linear filters as applied directly to spectroscopic data have their equivalence as Fourier filters in the time domain; in other words, convolution in one domain is equivalent to multiplication in the other domain. Which approach is best depends largely on computational complexity and convenience. For example, both moving averages and exponential Fourier filters are easy to apply, and so are simple approaches, one applied direct to the frequency spectrum and the other to the raw time series. Convoluting a spectrum with the Fourier transform of an exponential decay is a difficult procedure and so the choice of domain is made according to how easy the calculations are.

3.6 Topical Methods

There are a number of more sophisticated methods that have been developed over the past two decades. In certain instances a more specialised approach is appropriate and also generates much interest in the literature. There are particular situations, for example, where data are very noisy or incomplete, or where rapid calculations are required, which require particular solutions. The three methods listed below are topical and implemented within a number of common software packages. They do not represent a comprehensive review but are added for completion, as they are regularly reported in the chemometrics literature and are often available in common software packages.

3.6.1 Kalman Filters

The Kalman filter has its origin in the need for rapid on-line curve fitting. In some situations, such as chemical kinetics, it is desirable to calculate a model whilst the reaction is taking place rather than wait until the end. In on-line applications such as process control, it may be useful to see a smoothed curve as the process is taking place, in real time, rather than later. The general philosophy is that, as something evolves with time, more information becomes available so the model can be refined. As each successive sample is recorded, the model improves. It is possible to predict the response from information provided at previous sample times and see how this differs from the observed response, so changing the model.

Kalman filters are fairly complex to implement computationally, but the principles are as follows, and will be illustrated by the case where a single response (y) depends on a single factor (x). There are three main steps:

1.Model the current datapoint (i), for example, calculate yˆi|i1 = xi · bi1 using a

polynomial in x, and methods introduced in Chapter 2. The parameters bi 1 are initially guesses, which are refined with time. The | symbol means that the model

164 CHEMOMETRICS

of yi is based on

the first i 1 datapoints, xi is a row vector consisting of the

terms in a model

(usually, but not exclusively, polynomial) and b is a column

vector. For example,

if x

i =

2, then a three parameter quadratic model of the form

2

 

 

yi = b0 + b1x + b2x2

 

gives xi = (1, 2, 4).

2.This next step is to see how well this model predicts the current datapoint and calculate di = yi yˆi|i1, which is called the innovation. The closer these values are, the better is the model.

3.Finally, refine the model by recalculating the coefficients

bi = bi 1 + ki di

If the estimated and observed values of y are identical, the value of b will be unchanged. If the observed value is more than the estimated value, it makes sense to increase the size of the coefficients to compensate. The column vector ki is called the gain vector. There are a number of ways of calculating this, but the larger it is the greater is the uncertainty in the data.

A common (but complicated way) of calculating the gain vector is as follows.

1.Start with a matrix Vi 1 which represents the variance of the coefficients. This is a square matrix, with the number or rows and columns equal to the number of coefficients in the model. Hence if there are five coefficients, there will be 25 elements in the matrix. The higher these numbers are, the less certain is the prediction of the coefficients. Start with a diagonal matrix containing some high numbers.

2.Guess a number r that represents the approximate error at each point. This could be the root mean square replicate error. This number is not too crucial, and it can be set as a constant throughout the calculation.

3.The vector ki is given by

ki =

V i1.x i

=

V i1.x i

xi .V i1.x i r

q r

4. The new matrix Vi is given by

Vi = Vi 1 ki .xi .Vi 1

The magnitude of the elements of this matrix should reduce with time, as the measurements become more certain, meaning a consequential reduction in k and so the coefficient b converging (see step 3 of the main algorithm).

Whereas it is not always necessary to understand the computational details, it is important to appreciate the application of the method. Table 3.10 represents the progress of such a calculation.

A model of the form yi = b0 + b1x + b2x2 is to be set up, there being three coefficients.

The initial guess of the three coefficients is 0.000. Therefore, the guess of the response when x = 0 is 0, and the innovation is 0.840 0.000 (or the observed minus the predicted using the initial model).

SIGNAL PROCESSING

 

 

 

 

 

 

 

165

 

 

 

 

 

 

 

 

Table 3.10 Kalman filter calculation.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

xi

yi

b0

b1

b2

xˆi

 

k

 

0.000

0.840

0.841

0.000

0.000

0.000

1.001

0.000

0.000

1.000

0.737

0.841

0.052

0.052

0.841

0.001

0.501

0.501

2.000

0.498

0.841

0.036

0.068

0.530

0.001

0.505

0.502

3.000

0.296

0.849

0.114

0.025

0.124

0.051

0.451

0.250

4.000

0.393

0.883

0.259

0.031

0.003

0.086

0.372

0.143

5.000

0.620

0.910

0.334

0.053

0.371

0.107

0.304

0.089

6.000

0.260

0.842

0.192

0.020

0.829

0.119

0.250

0.060

7.000

0.910

0.898

0.286

0.038

0.458

0.125

0.208

0.042

8.000

0.124

0.778

0.120

0.010

1.068

0.127

0.176

0.030

9.000

0.795

0.817

0.166

0.017

0.490

0.127

0.150

0.023

10.000

0.436

0.767

0.115

0.010

0.831

0.126

0.129

0.017

11.000

0.246

0.712

0.064

0.004

0.693

0.124

0.113

0.014

12.000

0.058

0.662

0.024

0.001

0.469

0.121

0.099

0.011

13.000

0.412

0.589

0.031

0.006

0.211

0.118

0.088

0.009

14.000

0.067

0.623

0.007

0.004

0.236

0.115

0.078

0.007

15.000

0.580

0.582

0.033

0.006

0.210

0.112

0.070

0.006

16.000

0.324

0.605

0.020

0.005

0.541

0.108

0.063

0.005

17.000

0.896

0.575

0.036

0.007

0.606

0.105

0.057

0.004

18.000

1.549

0.510

0.069

0.009

0.919

0.102

0.052

0.004

19.000

1.353

0.518

0.065

0.009

1.426

0.099

0.047

0.003

20.000

1.642

0.521

0.064

0.009

1.675

0.097

0.043

0.003

21.000

2.190

0.499

0.073

0.009

1.954

0.094

0.040

0.002

 

22.000

2.206

0.513

0.068

0.009

2.359

0.091

0.037

0.002

Start with a matrix

 

100

0

0

 

Vi

0

100

0

 

=

0

0

100

 

the diagonal numbers representing high uncertainty in measurements of the parameters, given the experimental numbers.

Use a value of r of 0.1. Again this is a guess, but given the scatter of the experimental points, it looks as if this is a reasonable number. In fact, values 10-fold greater or smaller do not have a major impact on the resultant model, although they do influence the first few estimates.

As more samples are obtained it can be seen that

the size of k decreases;

the values of the coefficients converge;

there is a better fit to the experimental data.

Figure 3.29 shows the progress of the filter. The earlier points are very noisy and deviate considerably from the experimental data, whereas the later points represent a fairly smooth curve. In Figure 3.30, the progress of the three coefficients is presented, the graphs being normalised to a common scale for clarity. Convergence takes about 20 iterations. A final answer of yi = 0.513 + 0.068x 0.009x2 is obtained in this case.

It is important to recognise that Kalman filters are computationally elaborate and are not really suitable unless there is a special reason for performing on-line calculations.

166

 

 

 

 

CHEMOMETRICS

1.5

 

 

 

 

 

1.0

 

 

 

 

 

 

 

 

Raw data

 

 

0.5

 

 

 

 

 

0.0

 

 

 

 

 

0

5

10

15

20

25

−0.5

 

 

 

 

 

−1.0

 

 

 

 

 

−1.5

 

 

Filtered data

 

 

 

 

 

 

 

−2.0

 

 

 

 

 

−2.5

 

 

 

 

 

−3.0

 

 

 

 

 

Figure 3.29

Progress of the Kalman filter, showing the filtered and raw data

b0

b2

b1

Figure 3.30

Change in the three coefficients predicted by the Kalman filter with time

It is possible to take the entire X and y data of Table 3.10 and perform multiple linear regression as discussed in Chapter 2, so that

y = X .b

or

b = (X .X )1.X .y

SIGNAL PROCESSING

167

 

 

using the standard equation for the pseudo-inverse, giving an equation, yi = 0.512 + 0.068x 0.009x2, only very slightly different to the prediction by Kalman filters. If all the data are available there is little point in using Kalman filters in this case; the method is mainly useful for on-line predictions. On the whole, with modern computers, speed is no longer a very important consideration for curve fitting if all the data are available in one block, so some of the major interest in this area is historical; nevertheless, for real time graphical applications, Kalman filters are still useful, especially if one wants to look at a process evolving.

Kalman filters can be extended to more complex situations with many variables and many responses. The model does not need to be multilinear but, for example, may be exponential (e.g. in kinetics). Although the equations increase considerably in complexity, the basic ideas are the same.

3.6.2 Wavelet Transforms

Another topical method in chemical signal processing is the wavelet transform. The general principles are discussed below, without providing detailed information about algorithms, and provide a general understanding of the approach; wavelets are implemented in several chemometric packages, so many people have come across them.

Wavelet transforms are normally applied to datasets whose size is a power of two, for example consisting of 512 or 1024 datapoints. If a spectrum or chromatogram is longer, it is conventional simply to clip the data to a conveniently sized window.

A wavelet is a general function, usually, but by no means exclusively, of time, g(t), which can be modified by translation (b) or dilation (expansion/contraction) (a). The function should add up to 0, and can be symmetric around its mid-point. A very simple example the first half of which has the value +1 and the second half 1. Consider a small spectrum eight datapoints in width. A very simple basic wavelet function consists of four +1s followed by four 1s. This covers the entire spectrum and is said to be a wavelet of level 0. It is completely expanded and there is no room to translate this function as it covers the entire spectrum. The function can be halved in size (a = 2), to give a wavelet of level 1. This can now be translated (changing b), so there are two possible wavelets of level 1. The wavelets may be denoted by {n, m} where n is the level and m the translation.

Seven wavelets for an eight point series are presented in Table 3.11. The smallest is a two point wavelet. It can be seen that for a series consisting of 2N points,

there will be N levels numbered from 0 to N 1;

there will be 2n wavelets at level n;

there will be 2N 1 wavelets in total if all levels are employed.

Table 3.11

Wavelets.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Level 0

1

1

1

1

1

1

1

1

{0, 1}

Level 1

1

1

1

1

 

 

1

1

{1, 1}

 

 

1

 

 

1

1

{1, 2}

Level 2

1

 

1

 

 

 

 

{2, 1}

 

 

 

1

 

1

 

 

{2, 2}

 

 

 

 

 

1

 

1

{2, 3}

 

 

 

 

 

 

 

1

{2, 4}

168

CHEMOMETRICS

 

 

The key to the usefulness of wavelet transforms is that it is possible to express the data in terms of a sum of wavelets. For a spectrum 512 datapoints long, there will be 511, plus associated scaling factors. This transform is sometimes expressed by

h = W .f

where

f represents the raw data (e.g. a spectrum of 512 points in time);

W is a square matrix (with dimensions 512 × 512 in our example);

h are the coefficients of the wavelets, the calculation determining the best fit coefficients.

It is beyond the scope of this text to provide details as to how to obtain W, but many excellent papers exist on this topic.

Of course, a function such as that in Table 3.11 is not always ideal or particularly realistic in many cases, so much interest attaches to determining optimum wavelet functions, there being many proposed and often exotically named wavelets.

There are two principal uses for wavelets.

1.The first involves smoothing. If the original data consist of 512 datapoints, and are exactly fitted by 511 wavelets, choose the most significant wavelets (those with the highest coefficients), e.g. the top 50. In fact, if the nature of the wavelet function is selected with care only a small number of such wavelets may be necessary to model a spectrum which, in itself, consists of only a small number of peaks. Replace the spectrum simply with that obtained using the most significant wavelets.

2.The second involves data compression. Instead of storing all the raw data, store simply the coefficients of the most significant wavelets. This is equivalent to saying that if a spectrum is recorded over 1024 datapoints but consists of only five overlapping Gaussians, it is more economical (and, in fact, useful to the chemist) to store the parameters for the Gaussians rather than the raw data. In certain areas such as LC–MS there is a huge redundancy of data, most mass numbers having no significance and many data matrices being extremely sparse. Hence it is useful to reduce the amount of information.

Wavelets are a computationally sophisticated method for achieving these two facilities and are an area of active research within the data analytical community.

3.6.3 Maximum Entropy (Maxent) and Bayesian Methods

Over the past two decades there has been substantial scientific interest in the application of maximum entropy techniques with notable successes, for the chemist, in areas such as NMR spectroscopy and crystallography. Maxent has had a long statistical vintage, one of the modern pioneers being Jaynes, but the first significant scientific applications were in the area of deblurring of infrared images of the sky, involving the development of the first modern computational algorithm, in the early 1980s. Since then, there has been an explosion of interest and several implementations are available within commercial instrumentation. The most spectacular successes have been in the area of image analysis, for example NMR tomography, as well as forensic applications such

SIGNAL PROCESSING

169

 

 

as obtaining clear car number plates from hazy police photographs. In addition, there has been a very solid and large literature in the area of analytical chemistry.

3.6.3.1 Bayes’ Theorem

The fundamental application of maximum entropy techniques requires an understanding of Bayes’ theorem. Various definitions are necessary:

Data are experimental observations, for example the measurement of a time series of free induction decay prior to Fourier transformation. Data space contains a dataset for each experiment.

A map is the desired result, for example a clean and noise free spectrum, or the concentration of several compounds in a mixture. Map space exists in a similar fashion to data space.

An operation or transformation links these two spaces, such as Fourier transformation or factor analysis.

The aim of the experimenter is to obtain as good an estimate of map space as possible, consistent with his or her knowledge of the system. Normally there are two types of knowledge:

1.Prior knowledge is available before the experiment. There is almost always some information available about chemical data. An example is that a true spectrum will always be positive: we can reject statistical solutions that result in negative intensities. Sometimes much more detailed information such as lineshapes or compound concentrations is known.

2.Experimental information, which refines the prior knowledge to give a posterior model of the system.

The theorem is often presented in the following form:

probability (answer given new information)

probability (answer given prior information)

×probability (new information given answer)

or

p(map|experiment) p(map|prior information) × p(experiment|map)

where the | symbol stands for ‘given by’ and p is probability.

Many scientists ignore the prior information, and for cases where data are fairly good, this can be perfectly acceptable. However, chemical data analysis is most useful where the answer is not so obvious, and the data are difficult to analyse. The Bayesian method allows prior information or measurements to be taken into account. It also allows continuing experimentation, improving a model all the time.

3.6.3.2 Maximum Entropy

Maxent is one method for determining the probability of a model. A simple example is that of the toss of a six sided unbiassed die. What is the most likely underlying frequency distribution, and how can each possible distribution be measured? Figure 3.31 illustrates a flat distribution and Figure 3.32 a skew distribution (expressed

Соседние файлы в предмете Химия