Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Brereton Chemometrics

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

4.3 Mб

Скачать

☆

<<< < Предыдущая 3 4 5 6 7 8 9 10 11 12 13 1415 / 5015 16 17 18 19 20 21 22 23 24 25 26 27 > Следующая >>>

130	CHEMOMETRICS

1.The moving average (MA) component relates the noise at time i to the values of the noise at previous times. A model of order p is given by

t=p

ei = ci−t ei−t t=0

where ei−t is the noise at time i − t and ci−t is a corresponding coefﬁcient. A simple approach for simulating or modelling this type of noise is to put p = 1 and set the coefﬁcient to 1. Under such circumstances

ei = gi + ei−1

where gi may be generated using a normal distribution. Table 3.2 illustrates a stationary noise distribution and an MA distribution generated simply by adding successive values of the noise, so that, for example, the noise at time = 4 is given by 0.00547 = −0.04527 + 0.05075.

2.The autoregressive component relates the noise to the observed value of the response at one or more previous times. A model of order p is given by

t=p

xi = ci−t xi−t + ei t=0

Note that in a full ARMA model, ei itself is dependent on past values of noise.

There is a huge literature on ARMA processes, which are particularly important in the analysis of long-term trends such as in economics: it is likely that an underlying factor

Table 3.2 Stationary and moving average noise.

Time	Stationary	Moving average

1	−0.12775
2	0.14249	0.01474
3	−0.06001	−0.04527
4	0.05075	0.00548
5	0.06168	0.06716
6	−0.14433	−0.07717
7	−0.10591	−0.18308
8	0.06473	−0.11835
9	0.05499	−0.06336
10	−0.00058	−0.06394
11	0.04383	−0.02011
12	−0.08401	−0.10412
13	0.21477	0.11065
14	−0.01069	0.09996
15	−0.08397	0.01599
16	−0.14516	−0.12917
17	0.11493	−0.01424
18	0.00830	−0.00595
19	0.13089	0.12495
20	0.03747	0.16241

SIGNAL PROCESSING	131

causing errors in estimates changes with time rather than ﬂuctuating completely randomly. A battery of specialised techniques to cope with such situations have been developed. The chemist must be aware of these noise models, especially when studying natural phenomena such as in environmental chemistry, but also to a lesser extent in instrumental analysis. However, there is rarely sufﬁcient experimental evidence to establish highly sophisticated noise models. It is well advised, however, when studying a process, to determine whether a stationary noise distribution is adequate, especially if the results of simulations are to be relied upon, so an appreciation of basic methods for modelling noise is important. Very elaborate models are unlikely to be easy to verify experimentally.

3.2.3.3 Signal to Noise Ratio

The signal to noise ratio is a useful parameter to measure. The higher this number, the more intense the signal is relative to the background. This measurement is essentially empirical, and involves dividing the height of a relevant signal (normally the most intense if there are several in a dataset) by the root mean square of the noise, measured in a region of the data where there is known to be no signal.

3.2.4 Sequential Processes

Not all chemometric data arise from spectroscopy or chromatography, some are from studying processes evolving over time, ranging from a few hours (e.g. a manufacturing process) to thousands of years (e.g. a geological process). Many techniques for studying such processes are common to those developed in analytical instrumentation.

In some cases cyclic events occur, dependent, for example, on time of day, season of the year or temperature ﬂuctuations. These can be modelled using sine functions, and are the basis of time series analysis (Section 3.4). In addition, cyclicity is also observed in Fourier spectroscopy, and Fourier transform techniques (Section 3.5) may on occasions be combined with methods for time series analysis.

3.3 Linear Filters

3.3.1 Smoothing Functions

A key need is to obtain as informative a signal as possible after removing the noise from a dataset. When data are sequentially obtained, such as in time or frequency, the underlying signals often arise from a sum of smooth, monotonic functions, such as are described in Section 3.2.1, whereas the underlying noise is a largely uncorrelated function. An important method for revealing the signals involves smoothing the data; the principle is that the noise will be smoothed away using mild methods, whilst the signal will remain. This approach depends on the peaks having a half-width of several datapoints: if digital resolution is very poor signals will appear as spikes and may be confused with noise.

It is important to determine the optimum ﬁlter for any particular application. Too much smoothing and the signal itself is reduced in intensity and resolution. Too little smoothing, and noise remains. The optimum smoothing function depends on peakwidths (in datapoints) as well as noise characteristics.

3.3.1.1 Moving Averages

Conceptually, the simplest methods are linear ﬁlters whereby the resultant smoothed data are a linear function of the raw data. Normally this involves using the surrounding

132 CHEMOMETRICS

datapoints, for example, using a function of the three points in Figure 3.7 to recalculate a value for point i. Algebraically, such functions are expressed by

	p
xi ,new =
xi ,new =	cj xi+j
	j =−p

One of the simplest is a three point moving average (MA). Each point is replaced by the average of itself and the points before and after, so in the equation above p = 1 and cj = 1/3 for all three points.

The ﬁlter can be extended to a ﬁve (p = 2, c = 1/5), seven, etc., point MA:

•the more the points in the ﬁlter, the greater is the reduction in noise, but the higher is the chance of blurring the signal;

•the number of points in the ﬁlter is often called a ‘window’.

The ﬁlter is moved along the time series or spectrum, each datapoint being replaced successively by the corresponding ﬁltered datapoint. The optimal ﬁlter depends on the noise distribution and signal width. It is best to experiment with a number of different ﬁlter widths.

3.3.1.2 Savitsky–Golay Filters, Hanning and Hamming Windows

MA ﬁlters have the disadvantage in that they use a linear approximation for the data. However, peaks are often best approximated by curves, e.g. a polynomial. This is particularly true at the centre of a peak, where a linear model will always underestimate the intensity. Quadratic, cubic or even quartic models provide better approximations. The principle of moving averages can be extended. A seven point cubic ﬁlter, for

example, is used to ﬁt

xˆi = b0 + b1i + b2i2 + b3i3

using a seven point window, replacing the centre point by its best ﬁt estimate. The window is moved along the data, point by point, the calculation being repeated each time.

However, regression is computationally intense and it would be time consuming to perform this calculation in full simply to improve the appearance of a spectrum or chromatogram, which may consist of hundreds of datapoints. The user wants to be

xi−1

xi+1

Figure 3.7

Selection of points to use in a three point moving average ﬁlter

SIGNAL PROCESSING	133

able to select a menu item or icon on a screen and almost instantaneously visualise an improved picture. Savitsky and Golay in 1964 presented an alternative, and simpliﬁed, method of determining the new value of xi simply by re-expressing the calculation as a sum of coefﬁcients. These Savitsky–Golay ﬁlters are normally represented in tabular form (see Table 3.3). Both quadratic and cubic models result in identical coefﬁcients, as do quartic and quintic models. To determine a coefﬁcient cj ,

1.decide on the order of the model (quadratic and cubic models give identical results as do quartic and quintic models);

2.decide on the window size;

3.determine cj by selecting the appropriate number from Table 3.3 and dividing by the normalisation constant.

Several other MA methods have been proposed in the literature, two of the best known being the Hanning window (named after Julius Von Hann) (which for 3 points has weights 0.25, 0.5 and 0.25), and the Hamming window (named after R. W. Hamming) (which for 5 points has weights 0.0357, 0.2411, 0.4464, 0.2411, 0.0357) – not to be confused in name but very similar in effects. These windows can be calculated for any size, but we recommend these two ﬁlter sizes.

Note that although quadratic, cubic or higher approximations of the data are employed, the ﬁlters are still called linear because each ﬁltered point is a linear combination of the original data.

3.3.1.3 Calculation of Linear Filters

The calculation of moving average and Savitsky–Golay ﬁlters is illustrated in Table 3.4.

• The ﬁrst point of the three point moving average (see column 2) is simply given by

−0.049 = (0.079 − 0.060 − 0.166)/3

•The ﬁrst point of the seven point Savitsky–Golay quadratic/cubic ﬁltered data can be calculated as follows:

—From Table 3.3, obtain the seven coefﬁcients, namely c−3 = c3 = −2/21 = −0.095, c−2 = c2 = 3/21 = 0.143, c−1 = c1 = 6/21 = 0.286 and c0 = 7/21 = 0.333.

Table 3.3 Savitsky–Golay coefﬁcients ci+j			for smoothing.
Window size j	5	7	9	7		9

		Quadratic/cubic			Quartic/quintic

−4		−2	−21			15
−3	−3	−2	14	5		−55
−2	−3	3	39	−30		30
−1	12	6	54	75		135
0	17	7	59	131		179
1	12	6	54	75		135
2	−3	3	39	−30		30
3		−2	14	5		−55
4			−21			15
Normalisation constant	35	21	231	231		429

134	CHEMOMETRICS

Table 3.4 Results of various ﬁlters on a dataset.

Raw data			Moving average			Quadratic/cubic Savitsky–Golay

		3 point	5 point	7 point	5 point		7 point	9 point

0.079	−0.049
−0.060	−0.049		−0.030		−0.156
−0.166	−0.113		−0.030		−0.156		−0.069
−0.113	−0.056		−0.017	0.030	−0.081		−0.069	−0.005
0.111	0.048		0.038	0.067	0.061		0.026	−0.005
0.145	0.156		0.140	0.168	0.161		0.128	0.093
0.212	0.233		0.291	0.338	0.206		0.231	0.288
0.343	0.400		0.474	0.477	0.360		0.433	0.504
0.644	0.670		0.617	0.541	0.689		0.692	0.649
1.024	0.844		0.686	0.597	0.937		0.829	0.754
0.863	0.814		0.724	0.635	0.859		0.829	0.765
0.555	0.651		0.692	0.672	0.620		0.682	0.722
0.536	0.524		0.607	0.650	0.491		0.539	0.628
0.482	0.538		0.533	0.553	0.533		0.520	0.540
0.597	0.525		0.490	0.438	0.550		0.545	0.474
0.495	0.478		0.395	0.381	0.516		0.445	0.421
0.342	0.299		0.330	0.318	0.292		0.326	0.335
0.061	0.186		0.229	0.242	0.150		0.194	0.219
0.156	0.102		0.120	0.157	0.103		0.089	0.081
0.090	0.065		0.053	0.118	0.074		0.016	0.041
−0.050	0.016		0.085	0.081	−0.023		0.051	0.046
0.007	0.059		0.070	0.080	0.047		0.055	0.070
0.220	0.103		0.063	0.071	0.136		0.083	0.072
0.081	0.120		0.091	0.063	0.126		0.122	0.102
0.058	0.076		0.096	0.054	0.065		0.114	0.097
0.089	0.060		0.031	0.051	0.077		0.033	0.054
0.033	0.005		0.011	0.015	0.006		0.007
−0.107	−0.030		−0.007		−0.051
−0.016	−0.052
−0.032

—Multiply these coefﬁcients by the raw data and sum to obtain the smoothed value of the data:

xi ,new = −0.095 × 0.079 + 0.143 × −0.060 + 0.286 × −0.166 + 0.333

× −0.113 + 0.286 × 0.111 + 0.143 × 0.145 − 0.095 × 0.212 = −0.069

Figure 3.8(a) is a representation of the raw data. The result of using MA ﬁlters is shown in Figure 3.8(b). A three point MA preserves the resolution (just), but a ﬁve point MA loses this and the cluster appears to be composed of only one peak. In contrast, the ﬁve and seven point quadratic/cubic Savitsky–Golay ﬁlters [Figure 3.8(c)] preserve resolution whilst reducing noise and only starts to lose resolution when using a nine point function.

3.3.1.4 Running Median Smoothing

Most conventional ﬁlters involve computing local multilinear models, but in certain areas, such as process analysis, there can be spikes (or outliers) in the data which are unlikely to be part of a continuous process. An alternative method involves using

SIGNAL PROCESSING	135

running median smoothing (RMS) functions which calculate the median rather than the mean over a window. An example of a process is given in Table 3.5. A ﬁve point MA and ﬁve point RMS smoothing function are compared. A check on the calculation of the two different ﬁlters is as follows.

•The ﬁve point MA ﬁlter at time 4 is −0.010, calculated by taking the mean values for times 2–6, i.e.

−0.010 = (0.010 − 0.087 − 0.028 + 0.021 + 0.035)/5

•The ﬁve point RMS ﬁlter at time 4 is 0.010. This is calculating by arranging the readings for times 2–6 in the order −0.087, −0.028, 0.010, 0.021, 0.035, and selecting the middle value.

	1.2
	1
	0.8
Intensity	0.6
	0.4
	0.2
	0.2
	0
	0	5	10	15	20	25	30
	−0.2
	−0.4

Time

(a) Raw data

	1
	0.8	3 Point			5 Point
	0.8				5 Point
					7 Point
	0.6
Intensity	0.4
Intensity
	0.2
	0
	0	5	10	15	20	25	30
	−0.2

Time

(b) Moving average filters

Figure 3.8

Filtering of data

136							CHEMOMETRICS
	1				5 Point
		7 Point			5 Point
		7 Point
	0.8
					9 Point
	0.6
Intensity	0.4
Intensity	0.2
	0
	0	5	10	15	20	25	30
	−0.2
	−0.4
				Time

Figure 3.8

(continued)

Table 3.5 A sequential process: illustration of moving average and median smoothing.

Time

Data

5 point MA

5 point RMS

10.133

20.010

3	−0.087	0.010	0.010
4	−0.028	−0.010	0.010
5	0.021	0.048	0.021
6	0.035	0.047	0.021
7	0.298	0.073	0.035
8	−0.092	0.067	0.035
9	0.104	0.109	0.104
10	−0.008	0.094	0.104
11	0.245	0.207	0.223
12	0.223	0.225	0.223
13	0.473	0.251	0.223
14	0.193	0.246	0.223
15	0.120	0.351	0.223
16	0.223	0.275	0.193
17	0.745	0.274	0.190
18	0.092	0.330	0.223
19	0.190	0.266	0.190
20	0.398	0.167	0.190
21	−0.095	0.190	0.207
22	0.250	0.200	0.239
23	0.207	0.152	0.207

240.239

250.160

The results are presented in Figure 3.9. Underlying trends are not obvious from inspection of the raw data. Of course, further mathematical analysis might reveal a systematic trend, but in most situations the ﬁrst inspection is graphical. In real time situations, such as process control, on-line graphical inspection is essential. The ﬁve point MA

SIGNAL PROCESSING	137

Intensity

0.8

Raw data

0.7

0.6

5 point MA

0.5

0.4

0.3

0.2

0.1

5 point RMS

0.0

−0.1

−0.2

Time

Figure 3.9

Comparison of moving average (MA) and running median smoothing (RMS)

does suggest a systematic process, but it is not at all clear whether the underlying process increases monotonically with time, or increases and then decreases. The ﬁve point RMS suggests an increasing process, and is much smoother than the result of a MA ﬁlter.

Each type of smoothing function removes different features in the data and often a combination of several approaches is recommended especially for real world problems. Dealing with outliers is an important issue: sometimes these points are due to measurement errors. Many processes take time to deviate from the expected value, and a sudden glitch in the system unlikely to be a real effect. Often a combination of ﬁlters is recommend, for example a ﬁve point median smoothing followed by a three point Hanning window. These methods are very easy to implement computationally and it is possible to view the results of different ﬁlters simultaneously.

3.3.1.5 Reroughing

Finally, brief mention will be made of the technique of reroughing. The ‘rough’ is given by

Rough = Original − Smooth

where the smoothed data value is obtained by one of the methods described above. The rough represents residuals but can in itself be smoothed. A new original data value is calculated by

Reroughed = Smooth + Smoothed rough

This is useful if there is suspected to be a number of sources of noise. One type of noise may genuinely reﬂect difﬁculties in the underlying data, the other may be due to outliers that do not reﬂect a long term trend. Smoothing the rough can remove one of these sources.

138	CHEMOMETRICS

3.3.2 Derivatives

The methods in Section 3.3.1 are concerned primarily with removing noise. Most methods leave peakwidths either unchanged or increased, equivalent to blurring. In signal analysis an important separate need is to increase resolution. In Section 3.5.2 we will discuss the use of ﬁlters combined with Fourier transformation. In Chapter 6 we will discuss how to improve resolution when there is an extra dimension to the data (multivariate curve resolution). However, a simple and frequently used approach is to calculate derivatives. The principle is that inﬂection points in close peaks become turning points in the derivatives. The ﬁrst and second derivatives of a pure Gaussian are presented in Figure 3.10.

•The ﬁrst derivative equals zero at the centre of the peak, and is a good way of accurately pinpointing the position of a broad peak. It exhibits two turning points.

•The second derivative is a minimum at the centre of the peak, crosses zero at the positions of the turning points for the ﬁrst derivative and exhibits two further turning points further apart than in the ﬁrst derivative.

•The apparent peak width is reduced using derivatives.

The properties are most useful when there are several closely overlapping peaks, and higher order derivatives are commonly employed, for example in electron spin resonance and electronic absorption spectroscopy, to improve resolution. Figure 3.11 illustrates the ﬁrst and second derivatives of two closely overlapping peaks. The second derivative clearly indicates two peaks and fairly accurately pinpoints their positions. The appearance of the ﬁrst derivative would suggest that the peak is not pure but, in this case, probably does not provide deﬁnitive evidence. It is, of course, possible to continue and calculate the third, fourth, etc., derivatives.

There are, however, two disadvantages of derivatives. First, they are computationally intense, as a fresh calculation is required for each datapoint in a spectrum or chromatogram. Second, and most importantly, they amplify noise substantially, and, therefore, require low signal to noise ratios. These limitations can be overcome by using Savitsky–Golay coefﬁcients similar to those described in Section 3.3.1.2, which involve rapid calculation of smoothed higher derivatives. The coefﬁcients for a number of window sizes and approximations are presented in Table 3.6. This is a common method for the determination of derivatives and is implemented in many software packages.

3.3.3 Convolution

Common principles occur in different areas of science, often under different names, and are introduced in conceptually radically different guises. In many cases the driving force is the expectations of the audience, who may be potential users of techniques, customers on courses or even funding bodies. Sometimes even the marketplace forces different approaches: students attend courses with varying levels of background knowledge and will not necessarily opt (or pay) for courses that are based on certain requirements. This is especially important in the interface between mathematical and experimental science.

Smoothing functions can be introduced in a variety of ways, for example, as sums of coefﬁcients or as a method for ﬁtting local polynomials. In the signal analysis literature, primarily dominated by engineers, linear ﬁlters are often reported as a form

SIGNAL PROCESSING	139

Figure 3.10

A Gaussian together with its ﬁrst and second derivatives

<<< < Предыдущая 3 4 5 6 7 8 9 10 11 12 13 1415 / 5015 16 17 18 19 20 21 22 23 24 25 26 27 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.20134.29 Mб16Baer M., Billing G.D. (eds.) - The role of degenerate states in chemistry (Adv.Chem.Phys. special issue, Wiley, 2002).pdf
#
15.08.20137.08 Mб55Basov N.I. i dr. Raschet i konstruirovanie formiruyushchego instrumenta dlya izgotovleniya izdelij (1991.pdf
#
15.08.20135.59 Mб68Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
#
15.08.2013324.82 Кб32benzyne-cyclization.pdf
#
15.08.201314.48 Mб18Borowko M. 2000 Computational methods in surface and colloid science.djvu
#
15.08.20134.3 Mб48Brereton Chemometrics.pdf
#
15.08.20131.07 Mб30Burshtejn K.Ya., Shorygin P.P. Kvantovohimicheskie raschety v organicheskoj himii i molekulyarnoj.pdf
#
15.08.201321.36 Mб45Carey F.A. - Organic Chemistry (2004)(en).djvu
#
15.08.201321.36 Mб39Carey F.A. Advanced organic chemistry 5ed., MGH, 2004.djvu
#
15.08.201311.62 Mб23Carey F.A. Advanced organic chemistry. Part A structure and mechanisms 1938.djvu
#
15.08.20138.77 Mб17Carey F.A. Advanced organic chemistry. Part B reaction and synthesis 1938.djvu