Brereton Chemometrics
.pdfAPPENDICES |
463 |
|
|
Figure A.38
Obtaining the pseudoinverse in Matlab
•the pseudoinverse can simply be obtained by the function pinv, without any further commands; see Figure A.38.
For a comprehensive list of facilities, see the manuals that come with Matlab, or the help files; however, a few that are useful to the reader of this book are as follows. The size function gives the dimensions of a matrix, so size(W) will return a 2 × 1 vector with elements, in our example, of 2 and 3. It is possible to create a new vector, for example, s = size(W); in such a situation s(1) will equal 2, or the number of rows. The element W(s(1), s(2)) represents the last element in the matrix W. In addition, it is possible to use the functions size(W,1) and size(W,2) which provide the number of rows and columns directly. These functions are very useful when writing simple programs as discussed below.
The mean function can be used in various ways. By default this function produces the mean of each column in a matrix, so that mean(W) results in a 1 × 3 row vector containing the means. It is possible to specify which dimension one wishes to take the mean over, the default being the first one. The overall mean of an entire matrix is obtained using the mean function twice, i.e. mean(mean(W)). Note that the mean of a vector is always a single number whether the vector is a column or row vector. This function is illustrated in Figure A.39. Similar syntax applies to functions such as min, max and std, but note that the last function calculates the sample rather than population standard deviation and if employed for scaling in chemometrics, you must convert back to the sample standard deviation, in the current case by typing std(W)/sqrt((s(1))/(s(1)-1)), where sqrt is a function that calculates the square root and s contains the number of rows in the matrix. Similar remarks apply to the var function, but it is not necessary use a square root in the calculation.
The norm function of a matrix is often useful and consists of the square root of the sum of squares, so in our example norm(W) equals 12.0419. This can be useful when scaling data, especially for vectors. Note that if Y is a row vector, then sqrt(Y*Y’) is the same as norm(Y).
It is useful to combine some of these functions, for example min(s) would be the minimum dimension of matrix W. Enthusiasts can increase the number of variables
464 |
CHEMOMETRICS |
|
|
Figure A.39
Mean function in Matlab
within a function, an example being min([s 2 4]), which finds the minimum of all the numbers in vector s together with 2 and 4. This facility can be useful if it is desired to limit to number of principal components or eigenvalues displayed. If Spec is a spectral matrix of variable dimensions, and we know that we will never have more than 10 significant components, then min([size(Spec)] 10) will choose a number that is the minimum of the two dimensions of Spec or equals 10 if this value is larger.
Some functions operate on individual elements rather than rows or columns. For example, sqrt(W) results in a new matrix of dimensions identical with W containing the square root of all the elements. In most cases whether a function returns a matrix, vector or scalar is commonsense, but there are certain linguistic features, a few rather historical, so if in doubt test out the function first.
A.5.4.4 Preprocessing
Preprocessing is slightly awkward in Matlab. One way is to write a small program with loops as described in Section A.5.6. If you think in terms of vectors and matrices, however, it is fairly easy to come up with a simple approach. If W is our original 2 × 3 matrix and we want to mean centre the columns, we can easily obtain a 1 × 3 vector w
APPENDICES |
465 |
|
|
Figure A.40
Mean centring a matrix in Matlab
which corresponds to the means of each column, multiply this by a 2 × 1 vector 1 giving a 2 × 3 vector consisting of the means, and so our new mean centred matrix V can be calculated as V = W − 1.w as illustrated in Figure A.40. There is a special function in Matlab called ones that also creates vectors or matrices that just consist of the number 1, there being several ways of using this, but an array ones (5,3) would create a matrix of dimensions 5 × 3 solely of 1s, so a 2 × 1 vector could be specified using the function ones(2,1) as an alternative to the approach illustrated in the figure.
The experienced user of Matlab can build on this to perform other common methods for preprocessing, such as standardisation.
A.5.4.5 Principal Components Analysis
PCA is simple in Matlab. The singular value decomposition (SVD) algorithm is employed, but this should normally give equivalent results to NIPALS except that all the PCs are calculated at once. One difference is that the scores and loadings are both normalised, so that for SVD
X = U .S .V
where, using the notation elsewhere in the text,
T = U .S
and
V = P
APPENDICES |
467 |
|
|
A.5.6 Introduction to Programming and Structure
For the enthusiasts it is possible to write quite elaborate programs and develop very professional looking m files. The beginner is advised to have a basic idea of a few of the main features of Matlab as a programming environment.
First and foremost is the ability to make comments (statements that are not executed), by starting a line with the % sign. Anything after this is simply ignored by Matlab but helps make large m files comprehensible.
Loops commence with the for statement, which has a variety of different syntaxes, the simplest being for i = begin : end which increments the variable i from the number begin (which must be a scalar) to end. An increment (which can be negative and does not need to be an integer) can be specified using the syntax for i = begin : inc : end; notice how, unlike many programming languages, this is the middle value of the three variables. Loops finish with the end statement. As an example, the operation of mean centring (Section A.5.4.4) is written in the form of a loop; see Figure A.41. The interested reader should be able to interpret the commands using the information given above. Obviously for this small operation a loop is not strictly necessary, but for more elaborate programs it is important to be able to use loops, and there is a lot of flexibility about addressing matrices which make this facility very useful.
If and while facilities are also useful to the programmer.
Figure A.41
A simple loop used for mean centring
468 |
CHEMOMETRICS |
|
|
Many programmers like to organise their work into functions. In this introductory text we will not delve too far into this, but a library of m files that consist of different functions can be easily set up. In order to illustrate this, we demonstrate a simple function called twoav that takes a matrix, calculates the average of each column and produces a vector consisting of two times the column averages. The function is stored in an m file called twoav in the current working directory. This is illustrated in Figure A.42. Note that the m file must start with the function statement, and the name of the function should correspond to the name of the m file. The arguments (in this case a matrix which is called p within the function and can be called anything in
Figure A.42
A simple function and its result
APPENDICES |
469 |
|
|
the main program, – W in this example) are place in brackets after the function name. The array o contains the result of the expression that is passed back.
A.5.7 Graphics
There are a large number of different types of graph available in Matlab. Below we discuss a few methods that can be used to produce diagrams of the type employed in this text. The enthusiast will soon discover further approaches. Matlab is a very powerful tool for data visualisation.
A.5.7.1 Creating Figures
There are several ways to create new graphs. The simplest is by a plotting command as discussed in the next sections. A new window consisting of a figure is created. Unless indicated otherwise, each time a graphics command is executed, the graph in the figure window is overwritten.
In order to organise the figures better, it is preferable to use the figure command. Each time this is typed in the Matlab command window, a new blank figure as illustrated in Figure A.43 is produced, so typing this three times in succession results in three blank figures, each of which is able to contain a graph. The figures are automatically numbered from 1 onwards. In order to return the second figure (number 2), simply type figure(2). All plotting commands apply to the currently open figure. If you wish to produce a graph in the most recently opened window, it is not necessary to specify a number. Therefore, if you were to type the command three times, unless specified otherwise the current graph will be displayed in Figure 3. The figures can be accessed either as small icons or through the Window menu item. It is possible to skip figure numbers, so the command figure(10) will create a figure number 10, even if no other figures have been created.
If you want to produce several small graphs on one figure, use the subplot command. This has the syntax subplot(n,m,i). It divides the figure into n × m small graphs and puts the current plot into the ith position, where the first row is numbered from 1 to m, the second from m + 1 to 2m, and so on. Figure A.44 illustrates the case where the commands subplot(2,2,1) and subplot(2,2,3) have been used to divide the window into a 2 × 2 grid, capable of holding up to four graphs, and figures have been inserted into positions 1 (top left) and 3 (bottom left). Further figures can be inserted into the grid in the vacant positions, or the current figures can be replaced and overwritten.
New figures can also be created using the File menu, and the New option, but it is not so easy to control the names and so probably best to use the figure command.
Once the figure is complete you can copy it using the Copy Figure menu item and then place it in documents. In this section we will illustrate the figures by screen snapshots showing the grey background of the Matlab screen. Alternatively, the figures can be saved in Matlab format, using the menu item under the current directory, as a fig file, which can then be opened and edited in Matlab in the future.
A.5.7.2 Line Graphs
The simplest type of graph is a line graph. If Y is a vector then plot(Y) will simply produce a graph of each element against row number. Often we want to plot a row
470 |
CHEMOMETRICS |
|
|
Figure A.43
Blank figure window
or column of a matrix against element number, for example if each successive point corresponds to a point in time or a spectral wavelength. This is easy to do: the command plot(X(:,2)) plots the second column of X. Plotting a subset is also possible, for example plot(X(11:20,2)) produces a graph of rows 11–20, in practice allowing an expansion of the region of interest.
Once you have produced a line graph it is possible to change its appearance. This is easily done by first clicking the arrow tool in the graphics window, which allows editing of the properties, and then clicking on either the line to change the appearance of the data, or the axes. One useful facility is to make the lines thicker: the default line width of 0.6 is often thin when intended for publication (although it is a good size for displaying on a screen), and it is recommended to increase this to around 2. In addition, one sometimes wishes to mark the points, using the marker facility. The result is presented in Figure A.45. If you do not wish to join up the points with a line you can select a line style ‘none’. The appearance of the axes can also be altered. There are various commands to change the nature of these plots, and you are recommended to use the Matlab help facility for further information.
APPENDICES |
471 |
|
|
Figure A.44
Use of multiple plot facility
It is possible to superimpose several line graphs, for example if X is a matrix with five columns, then the command plot(X) will superimpose five graphs in one picture.
Note that you can further refine the appearance of the plot using the tools to create labels, extra lines and arrows.
A.5.7.3 Two Variables Against Each Other
The plot command can also be used to plot two variables against each other. It is common to plot columns of matrices against each other, for example when producing a PC plot of the scores of one PC against another. The command plot(X(:,2), X(:,3)) produces a graph of the third column of X against the second column. If you do not want to join the points up with a line you can either use the graphics editor as in Section A.5.7.2, or else the scatter command, which has a similar syntax but by default simply presents each point as a symbol. This is illustrated in Figure A.46.
A.5.7.4 Labelling Points
Points in a graph can be labelled using the text command. The basic syntax is text (A,B,name), where the A and B are arrays with the same number of elements, and it is recommended that name is an array of names or characters likewise with the
472 |
CHEMOMETRICS |
|
|
Figure A.45
Changing the properties of a graph in Matlab