Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Brereton Chemometrics

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

4.3 Mб

Скачать

☆

<<< < Предыдущая 6 7 8 9 10 11 12 13 14 15 16 1718 / 5018 19 20 21 22 23 24 25 26 27 28 29 30 > Следующая >>>

160	CHEMOMETRICS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Figure 3.26

Multiplying the data in Figure 3.25 by a double exponential

The time series decays more slowly than the original, but there is not much increase in noise. The peakshape in the transform is almost as narrow as that obtained using a single exponential, but noise is dramatically reduced.

A large number of so-called matched or optimal ﬁlters have been proposed in the literature, many speciﬁc to a particular kind of data, but the general principles are to obtain increased resolution without introducing too much noise. In some cases pure noise reduction ﬁlters (e.g. negative exponentials) can be applied where noise is not a serious problem. It is important to recognise that these ﬁlters can distort peakshapes. Although there is a substantial literature on this subject, the best approach is to tackle the problem experimentally rather than rely on elaborate rules. Figure 3.27 shows the result of applying a simple double exponential function to a typical time series. Note the bell-shaped function, which is usual. The original spectrum suggests a cluster of

SIGNAL PROCESSING	161

Figure 3.27

Use of a double exponential ﬁlter

peaks, but only two clear peaks are visible. Applying a ﬁlter function suggests that there are at least four underlying peaks in the spectrum, although there is some distortion of the data in the middle, probably a result of a function that is slightly too severe.

3.5.2.3 Fourier Self-deconvolution

In many forms of spectroscopy such as NMR and IR, data are acquired directly as a time series, and must be Fourier transformed to obtain an interpretable spectrum. However, any spectrum or chromatogram can be processed using methods described in this section, even if not acquired as a time series. The secret is to inverse transform (see Section 3.5.1) back to a time series.

Normally, three steps are employed, as illustrated in Figure 3.28:

1.transform the spectrum into a time series: this time series does not physically exist but can be handled by a computer;

2.then apply a Fourier ﬁlter to the time series;

3.ﬁnally, transform the spectrum back, resulting in improved quality.

This procedure is called Fourier self-deconvolution, and is an alternative to the digital ﬁlters in Section 3.3.

3.5.3 Convolution Theorem

Some people are confused by the difference between Fourier ﬁlters and linear smoothing and resolution functions. In fact, both methods are equivalent and are related

162	CHEMOMETRICS

Inverse transform

Filter

Forward transform

Figure 3.28

Fourier self-deconvolution of a peak cluster

by the convolution theorem, and both have similar aims, to improve the quality of spectroscopic or chromatographic or time series data.

The principles of convolution have been discussion in Section 3.3.3. Two functions, f and g, are said to be convoluted to give h if

	j =p
hi =
hi =	fj gi+j
	j =−p

Convolution involves moving a window or digital ﬁlter function (such as a Savit- sky–Golay or moving average) along a series of data such as a spectrum, multiplying the data by that function at each successive datapoint. A three point moving average

SIGNAL PROCESSING	163

involves multiplying each set of three points in a spectrum by a function containing the values (1/3, 1/3, 1/3), and the spectrum is said to be convoluted by the moving average ﬁlter function.

Filtering a time series, using Fourier time domain ﬁlters, however, involves multiplying the entire time series by a single function, so that

Hi = Fi .Gi

The convolution theorem states that f , g and h are Fourier transforms of F , G and H . Hence linear ﬁlters as applied directly to spectroscopic data have their equivalence as Fourier ﬁlters in the time domain; in other words, convolution in one domain is equivalent to multiplication in the other domain. Which approach is best depends largely on computational complexity and convenience. For example, both moving averages and exponential Fourier ﬁlters are easy to apply, and so are simple approaches, one applied direct to the frequency spectrum and the other to the raw time series. Convoluting a spectrum with the Fourier transform of an exponential decay is a difﬁcult procedure and so the choice of domain is made according to how easy the calculations are.

3.6 Topical Methods

There are a number of more sophisticated methods that have been developed over the past two decades. In certain instances a more specialised approach is appropriate and also generates much interest in the literature. There are particular situations, for example, where data are very noisy or incomplete, or where rapid calculations are required, which require particular solutions. The three methods listed below are topical and implemented within a number of common software packages. They do not represent a comprehensive review but are added for completion, as they are regularly reported in the chemometrics literature and are often available in common software packages.

3.6.1 Kalman Filters

The Kalman ﬁlter has its origin in the need for rapid on-line curve ﬁtting. In some situations, such as chemical kinetics, it is desirable to calculate a model whilst the reaction is taking place rather than wait until the end. In on-line applications such as process control, it may be useful to see a smoothed curve as the process is taking place, in real time, rather than later. The general philosophy is that, as something evolves with time, more information becomes available so the model can be reﬁned. As each successive sample is recorded, the model improves. It is possible to predict the response from information provided at previous sample times and see how this differs from the observed response, so changing the model.

Kalman ﬁlters are fairly complex to implement computationally, but the principles are as follows, and will be illustrated by the case where a single response (y) depends on a single factor (x). There are three main steps:

1.Model the current datapoint (i), for example, calculate yˆi|i−1 = xi · bi−1 using a

polynomial in x, and methods introduced in Chapter 2. The parameters bi −1 are initially guesses, which are reﬁned with time. The | symbol means that the model

164 CHEMOMETRICS


of yi is based on	the ﬁrst i − 1 datapoints, xi is a row vector consisting of the
terms in a model	(usually, but not exclusively, polynomial) and b is a column
vector. For example,		if x		i =	2, then a three parameter quadratic model of the form
		2
yi = b0 + b1x + b2x2			gives xi = (1, 2, 4).

2.This next step is to see how well this model predicts the current datapoint and calculate di = yi − yˆi|i−1, which is called the innovation. The closer these values are, the better is the model.

3.Finally, reﬁne the model by recalculating the coefﬁcients

bi = bi −1 + ki di

If the estimated and observed values of y are identical, the value of b will be unchanged. If the observed value is more than the estimated value, it makes sense to increase the size of the coefﬁcients to compensate. The column vector ki is called the gain vector. There are a number of ways of calculating this, but the larger it is the greater is the uncertainty in the data.

A common (but complicated way) of calculating the gain vector is as follows.

1.Start with a matrix Vi −1 which represents the variance of the coefﬁcients. This is a square matrix, with the number or rows and columns equal to the number of coefﬁcients in the model. Hence if there are ﬁve coefﬁcients, there will be 25 elements in the matrix. The higher these numbers are, the less certain is the prediction of the coefﬁcients. Start with a diagonal matrix containing some high numbers.

2.Guess a number r that represents the approximate error at each point. This could be the root mean square replicate error. This number is not too crucial, and it can be set as a constant throughout the calculation.

3.The vector ki is given by

ki =	V i−1.x i	=	V i−1.x i
	xi .V i−1.x i − r		q − r

4. The new matrix Vi is given by

Vi = Vi −1 − ki .xi .Vi −1

The magnitude of the elements of this matrix should reduce with time, as the measurements become more certain, meaning a consequential reduction in k and so the coefﬁcient b converging (see step 3 of the main algorithm).

Whereas it is not always necessary to understand the computational details, it is important to appreciate the application of the method. Table 3.10 represents the progress of such a calculation.

•A model of the form yi = b0 + b1x + b2x2 is to be set up, there being three coefﬁcients.

•The initial guess of the three coefﬁcients is 0.000. Therefore, the guess of the response when x = 0 is 0, and the innovation is 0.840 − 0.000 (or the observed minus the predicted using the initial model).

SIGNAL PROCESSING									165

	Table 3.10 Kalman ﬁlter calculation.

	xi	yi	b0	b1	b2	xˆi		k
0.000		0.840	0.841	0.000	0.000	0.000	1.001	0.000	0.000
1.000		0.737	0.841	−0.052	−0.052	0.841	−0.001	0.501	0.501
2.000		0.498	0.841	−0.036	−0.068	0.530	0.001	−0.505	0.502
3.000		0.296	0.849	−0.114	−0.025	0.124	0.051	−0.451	0.250
4.000		0.393	0.883	−0.259	0.031	0.003	0.086	−0.372	0.143
5.000		0.620	0.910	−0.334	0.053	0.371	0.107	−0.304	0.089
6.000		0.260	0.842	−0.192	0.020	0.829	0.119	−0.250	0.060
7.000		0.910	0.898	−0.286	0.038	0.458	0.125	−0.208	0.042
8.000		0.124	0.778	−0.120	0.010	1.068	0.127	−0.176	0.030
9.000		0.795	0.817	−0.166	0.017	0.490	0.127	−0.150	0.023
10.000		0.436	0.767	−0.115	0.010	0.831	0.126	−0.129	0.017
11.000		0.246	0.712	−0.064	0.004	0.693	0.124	−0.113	0.014
12.000		0.058	0.662	−0.024	−0.001	0.469	0.121	−0.099	0.011
13.000		−0.412	0.589	0.031	−0.006	0.211	0.118	−0.088	0.009
14.000		0.067	0.623	0.007	−0.004	−0.236	0.115	−0.078	0.007
15.000		−0.580	0.582	0.033	−0.006	−0.210	0.112	−0.070	0.006
16.000		−0.324	0.605	0.020	−0.005	−0.541	0.108	−0.063	0.005
17.000		−0.896	0.575	0.036	−0.007	−0.606	0.105	−0.057	0.004
18.000		−1.549	0.510	0.069	−0.009	−0.919	0.102	−0.052	0.004
19.000		−1.353	0.518	0.065	−0.009	−1.426	0.099	−0.047	0.003
20.000		−1.642	0.521	0.064	−0.009	−1.675	0.097	−0.043	0.003
21.000		−2.190	0.499	0.073	−0.009	−1.954	0.094	−0.040	0.002
	22.000	−2.206	0.513	0.068	−0.009	−2.359	0.091	−0.037	0.002

• Start with a matrix		100	0	0
Vi		0	100	0
	=	0	0	100

the diagonal numbers representing high uncertainty in measurements of the parameters, given the experimental numbers.

•Use a value of r of 0.1. Again this is a guess, but given the scatter of the experimental points, it looks as if this is a reasonable number. In fact, values 10-fold greater or smaller do not have a major impact on the resultant model, although they do inﬂuence the ﬁrst few estimates.

As more samples are obtained it can be seen that

•the size of k decreases;

•the values of the coefﬁcients converge;

•there is a better ﬁt to the experimental data.

Figure 3.29 shows the progress of the ﬁlter. The earlier points are very noisy and deviate considerably from the experimental data, whereas the later points represent a fairly smooth curve. In Figure 3.30, the progress of the three coefﬁcients is presented, the graphs being normalised to a common scale for clarity. Convergence takes about 20 iterations. A ﬁnal answer of yi = 0.513 + 0.068x − 0.009x2 is obtained in this case.

It is important to recognise that Kalman ﬁlters are computationally elaborate and are not really suitable unless there is a special reason for performing on-line calculations.

166					CHEMOMETRICS
1.5
1.0
			Raw data
0.5
0.0
0	5	10	15	20	25
−0.5
−1.0
−1.5			Filtered data
−1.5
−2.0
−2.5
−3.0

Figure 3.29

Progress of the Kalman ﬁlter, showing the ﬁltered and raw data

Figure 3.30

Change in the three coefﬁcients predicted by the Kalman ﬁlter with time

It is possible to take the entire X and y data of Table 3.10 and perform multiple linear regression as discussed in Chapter 2, so that

y = X .b

b = (X .X )−1.X .y

SIGNAL PROCESSING	167

using the standard equation for the pseudo-inverse, giving an equation, yi = 0.512 + 0.068x − 0.009x2, only very slightly different to the prediction by Kalman ﬁlters. If all the data are available there is little point in using Kalman ﬁlters in this case; the method is mainly useful for on-line predictions. On the whole, with modern computers, speed is no longer a very important consideration for curve ﬁtting if all the data are available in one block, so some of the major interest in this area is historical; nevertheless, for real time graphical applications, Kalman ﬁlters are still useful, especially if one wants to look at a process evolving.

Kalman ﬁlters can be extended to more complex situations with many variables and many responses. The model does not need to be multilinear but, for example, may be exponential (e.g. in kinetics). Although the equations increase considerably in complexity, the basic ideas are the same.

3.6.2 Wavelet Transforms

Another topical method in chemical signal processing is the wavelet transform. The general principles are discussed below, without providing detailed information about algorithms, and provide a general understanding of the approach; wavelets are implemented in several chemometric packages, so many people have come across them.

Wavelet transforms are normally applied to datasets whose size is a power of two, for example consisting of 512 or 1024 datapoints. If a spectrum or chromatogram is longer, it is conventional simply to clip the data to a conveniently sized window.

A wavelet is a general function, usually, but by no means exclusively, of time, g(t), which can be modiﬁed by translation (b) or dilation (expansion/contraction) (a). The function should add up to 0, and can be symmetric around its mid-point. A very simple example the ﬁrst half of which has the value +1 and the second half −1. Consider a small spectrum eight datapoints in width. A very simple basic wavelet function consists of four +1s followed by four −1s. This covers the entire spectrum and is said to be a wavelet of level 0. It is completely expanded and there is no room to translate this function as it covers the entire spectrum. The function can be halved in size (a = 2), to give a wavelet of level 1. This can now be translated (changing b), so there are two possible wavelets of level 1. The wavelets may be denoted by {n, m} where n is the level and m the translation.

Seven wavelets for an eight point series are presented in Table 3.11. The smallest is a two point wavelet. It can be seen that for a series consisting of 2N points,

• there will be N levels numbered from 0 to N − 1;

•there will be 2n wavelets at level n;

•there will be 2N − 1 wavelets in total if all levels are employed.

Table 3.11	Wavelets.

Level 0	1	1	1	1	−1	−1	−1	−1	{0, 1}
Level 1	1	1	−1	−1			−1	−1	{1, 1}
		−1			1	1	−1	−1	{1, 2}
Level 2	1	−1		−1					{2, 1}
			1	−1		−1			{2, 2}
					1	−1		−1	{2, 3}
							1	−1	{2, 4}

168	CHEMOMETRICS

The key to the usefulness of wavelet transforms is that it is possible to express the data in terms of a sum of wavelets. For a spectrum 512 datapoints long, there will be 511, plus associated scaling factors. This transform is sometimes expressed by

h = W .f

where

• f represents the raw data (e.g. a spectrum of 512 points in time);

• W is a square matrix (with dimensions 512 × 512 in our example);

•h are the coefﬁcients of the wavelets, the calculation determining the best ﬁt coefﬁcients.

It is beyond the scope of this text to provide details as to how to obtain W, but many excellent papers exist on this topic.

Of course, a function such as that in Table 3.11 is not always ideal or particularly realistic in many cases, so much interest attaches to determining optimum wavelet functions, there being many proposed and often exotically named wavelets.

There are two principal uses for wavelets.

1.The ﬁrst involves smoothing. If the original data consist of 512 datapoints, and are exactly ﬁtted by 511 wavelets, choose the most signiﬁcant wavelets (those with the highest coefﬁcients), e.g. the top 50. In fact, if the nature of the wavelet function is selected with care only a small number of such wavelets may be necessary to model a spectrum which, in itself, consists of only a small number of peaks. Replace the spectrum simply with that obtained using the most signiﬁcant wavelets.

2.The second involves data compression. Instead of storing all the raw data, store simply the coefﬁcients of the most signiﬁcant wavelets. This is equivalent to saying that if a spectrum is recorded over 1024 datapoints but consists of only ﬁve overlapping Gaussians, it is more economical (and, in fact, useful to the chemist) to store the parameters for the Gaussians rather than the raw data. In certain areas such as LC–MS there is a huge redundancy of data, most mass numbers having no signiﬁcance and many data matrices being extremely sparse. Hence it is useful to reduce the amount of information.

Wavelets are a computationally sophisticated method for achieving these two facilities and are an area of active research within the data analytical community.

3.6.3 Maximum Entropy (Maxent) and Bayesian Methods

Over the past two decades there has been substantial scientiﬁc interest in the application of maximum entropy techniques with notable successes, for the chemist, in areas such as NMR spectroscopy and crystallography. Maxent has had a long statistical vintage, one of the modern pioneers being Jaynes, but the ﬁrst signiﬁcant scientiﬁc applications were in the area of deblurring of infrared images of the sky, involving the development of the ﬁrst modern computational algorithm, in the early 1980s. Since then, there has been an explosion of interest and several implementations are available within commercial instrumentation. The most spectacular successes have been in the area of image analysis, for example NMR tomography, as well as forensic applications such

SIGNAL PROCESSING	169

as obtaining clear car number plates from hazy police photographs. In addition, there has been a very solid and large literature in the area of analytical chemistry.

3.6.3.1 Bayes’ Theorem

The fundamental application of maximum entropy techniques requires an understanding of Bayes’ theorem. Various deﬁnitions are necessary:

•Data are experimental observations, for example the measurement of a time series of free induction decay prior to Fourier transformation. Data space contains a dataset for each experiment.

•A map is the desired result, for example a clean and noise free spectrum, or the concentration of several compounds in a mixture. Map space exists in a similar fashion to data space.

•An operation or transformation links these two spaces, such as Fourier transformation or factor analysis.

The aim of the experimenter is to obtain as good an estimate of map space as possible, consistent with his or her knowledge of the system. Normally there are two types of knowledge:

1.Prior knowledge is available before the experiment. There is almost always some information available about chemical data. An example is that a true spectrum will always be positive: we can reject statistical solutions that result in negative intensities. Sometimes much more detailed information such as lineshapes or compound concentrations is known.

2.Experimental information, which reﬁnes the prior knowledge to give a posterior model of the system.

The theorem is often presented in the following form:

probability (answer given new information)

probability (answer given prior information)

×probability (new information given answer)

p(map|experiment) p(map|prior information) × p(experiment|map)

where the | symbol stands for ‘given by’ and p is probability.

Many scientists ignore the prior information, and for cases where data are fairly good, this can be perfectly acceptable. However, chemical data analysis is most useful where the answer is not so obvious, and the data are difﬁcult to analyse. The Bayesian method allows prior information or measurements to be taken into account. It also allows continuing experimentation, improving a model all the time.

3.6.3.2 Maximum Entropy

Maxent is one method for determining the probability of a model. A simple example is that of the toss of a six sided unbiassed die. What is the most likely underlying frequency distribution, and how can each possible distribution be measured? Figure 3.31 illustrates a ﬂat distribution and Figure 3.32 a skew distribution (expressed

<<< < Предыдущая 6 7 8 9 10 11 12 13 14 15 16 1718 / 5018 19 20 21 22 23 24 25 26 27 28 29 30 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.20134.29 Mб16Baer M., Billing G.D. (eds.) - The role of degenerate states in chemistry (Adv.Chem.Phys. special issue, Wiley, 2002).pdf
#
15.08.20137.08 Mб55Basov N.I. i dr. Raschet i konstruirovanie formiruyushchego instrumenta dlya izgotovleniya izdelij (1991.pdf
#
15.08.20135.59 Mб68Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
#
15.08.2013324.82 Кб32benzyne-cyclization.pdf
#
15.08.201314.48 Mб18Borowko M. 2000 Computational methods in surface and colloid science.djvu
#
15.08.20134.3 Mб48Brereton Chemometrics.pdf
#
15.08.20131.07 Mб30Burshtejn K.Ya., Shorygin P.P. Kvantovohimicheskie raschety v organicheskoj himii i molekulyarnoj.pdf
#
15.08.201321.36 Mб45Carey F.A. - Organic Chemistry (2004)(en).djvu
#
15.08.201321.36 Mб39Carey F.A. Advanced organic chemistry 5ed., MGH, 2004.djvu
#
15.08.201311.62 Mб23Carey F.A. Advanced organic chemistry. Part A structure and mechanisms 1938.djvu
#
15.08.20138.77 Mб17Carey F.A. Advanced organic chemistry. Part B reaction and synthesis 1938.djvu