Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

APPENDICES

443

 

 

Figure A.21

Use of Project Explorer

Figure A.22

Creating a new macro ‘Calc’

444

CHEMOMETRICS

 

 

Figure A.23

Result of multiplying the transpose of vector B34: B65 by C34: C65 to give a scalar in cell E34

The result is illustrated in Figure A.23. The problem is that the expression is a character expression, and it is not easy to substitute the numbers, for example 3, by a variable, e.g. j. There are ways round this, but they are awkward.

Perhaps the most useful feature of the ‘Record Macro’ facility is to obtain portions of code, and then edit these into a full program. It is a very convenient way of learning

APPENDICES

445

 

 

some of the statements in VBA, particularly graphics and changing the appearance of the spreadsheet, and then incorporating these into a macro.

A.4.4.3 VBA Language

It is not necessary to have an in-depth knowledge of VBA programming to understand this text, so we will describe only a small number of features that allow a user to understand simple programs. It is assumed that the reader has some general experience of programming, and this section mainly provides tips on how to get started in VBA.

A module is organised into a number of procedures, either subroutines of functions. A subroutine without arguments is declared as Sub new() and is recognised as a macro that can be called directly from Excel by any of the methods described in Section A.4.4.1. A subroutine with arguments is recognised only within VBA; an example is

Sub new(x() as integer, d as double)

where x() is an array of integers, and d is a double precision variable. A typical function is as follows:

Function size(dd() As Double, nsize As Integer) As Double

In this case the function returns a double precision number. Subroutines may be called using the Call statement and function names are simply used in a program; for example,

ss0 = size(t1(), n1)

At first it is easiest to save all subroutines and functions in a single module, but more experienced programmers can experiment with fairly sophisticated methods for organising and declaring procedures and variables.

All common variable types are supported. It is best to declare variables, using either the Dim or Redim statement, which can occur anywhere in a procedure. Arrays must be declared. There are some leftovers from older versions of the language whereby a variable that ends in % is automatically an integer, for example i%, and a variable ending in a $ character, for example n$. However, it is not necessary for these variable types to end in these characters providing they are specifically declared, e.g.

Dim count as integer

is a legitimate statement. Because Visual Basic has developed over the years from a fairly unstructured language to a powerful programming environment, facilities have to be compatible with earlier versions.

Comments can be made in two ways, either using a Rem statement, in which case the entire line is considered a comment, or by using a quotation mark, in which case all text to the right of the quotation mark also constitutes a comment.

Loops are started with a For statement, e.g.

For x = -2 to 8 step 0.5

446

CHEMOMETRICS

 

 

The default stepsize is +1, and under these circumstances it is possible to omit the step statement. Loops are concluded by a Next statement. If statements must be on one line ending with Then, and a typical syntax is as follows:

If (x > 10 And y <> -2) Then

All If blocks must conclude with End if, but it is possible to have an Else statement if desired. Naturally, several If statements can be nested.

A very important facility is to be able to pass information to and from the spreadsheet. For many purposes the Cells statement is adequate. Each cell is addressed with its row first and column second, so that the cell number (3,2) corresponds to B3 in alphanumeric format. It is then easy to pass data back and forth. The statement x = Cells(5,1) places the value of A5 into x. Equivalently, Cells(i%,k%) = scores(i%,k%) places the relevant value of scores into the cell in the ith row and kth column. Note that any type of information including character and logical information can be passed to and from the spreadsheet, so Cells(7,3) = "Result" places the word Result in C7, whereas Cells(7,3) = Result places the numerical value of a variable called Result (if it exists) in this cell.

For C programmers, it is possible to write programs in C, compile them into dynamic link libraries and use these from Excel, normally via VBA. This has advantages for numerically intensive calculations such as PCA on large datasets which are slow in VBA but which can be optimised in C. A good strategy is to employ Excel as the front end, then VBA to communicate with the user and also for control of dialog boxes, and finally a C DLL for the intensive numeric calculations.

Finally, many chemometricians use matrices. Matlab (see Section A.5) is better than Excel for developing sophisticated matrix based algorithms, but for simple applications the matrix operations of Excel can also be translated into VBA. As discussed in Section A.4.4.2, this is somewhat awkward, but for the specialist programmer there is a simple trick, which is to break down the matrix expression into a character string. Strings can be concatenated using the & sign. Therefore, the expression

n$ = "R[" & "-3]"

would give a character new variable n$, the characters representing R[-3]. The CStr function converts a number to its character representation. Hence

n$ = "R[" & CStr(-3) & "]"

would do the same trick. This allows the possibility of introducing variables into the matrix expression, so the following code is acceptable:

i% = -3

n1$ = CStr(i%)

n2$ = CStr(i% + 1)

m$ ="=MMULT(TRANSPOSE(RC[" & n1$ & "]:R[31]C[" & n1$ & "]),RC[" & n2$ & "]:R[31]C[" & n2$ & "])" Selection.FormulaArray = m$

APPENDICES

447

 

 

 

Note that the

 

concatenates lines into a single statement. By generalising, it is easy to

 

incorporate flexible matrix operations into VBA programs, and modifications of this principle could be employed to produce a matrix library.

There are a huge number of tricks that the experienced VBA programmer can use, but these are best learnt with practice.

A.4.5 Charts

All graphs in this text have been produced in Excel. The graphics facilities are fairly good except for 3D representations. This section will briefly outline some of the main features of the Chart Tool useful for applications in this text.

Graphs can be produced either by selecting the ‘Insert’ menu and the ‘Chart’ option, or by using the Chart Wizard symbol . Most graphs in this book are produced using an xy plot or scatterplot, allowing the value of one parameter (e.g. the score of PC2) to be plotted against another (e.g. the score of PC1).

It is often desirable to use different symbols for groups of parameters or classes of compounds. This can be done by superimposing several graphs, each being represented by a separate series. This is illustrated in Figure A.24 in which cells AR2 to AS9 represent Series 1 and AR10 to AS17 represent Series 2, each set of measurements having a different symbol. When opening the Chart Wizard, use the ‘Series’ rather than ‘Data Range’ option to achieve this.

The default graphics options are not necessarily the most appropriate, and are designed primarily for display on a screen. For printing out, or pasting into Word, it is best to remove the gridlines and the legend, and also to change the background to white. These can be done either by clicking on the appropriate parts of the graph when it has been completed or using the ‘Chart Options’ dialog box of the Chart Wizard. The final graph can be displayed either as an object on the current worksheet or, better, as in Figure A.25, on a separate sheet. Most aspects of the chart can be changed, such as symbol sizes and types, colours and axes, by clicking on the appropriate part of the chart and then following the relevant menus. It is possible to join lines up using the ‘Format Data Series’ dialog box, or by selecting an appropriate display option of the scatterplot. Using the Drawing toolbox, arrows and labelling can be added to charts.

One difficulty involves attaching a label to each point in a chart, such as the name of an object or a variable. With this text we produce a downloadable macro that can be edited to permit this facility as described in Section A.4.6.1.

A.4.6 Downloadable Macros

A.4.6.1 VBA Code

Most of the exercises in this book can be performed using simple spreadsheet functions, and this is a valuable exercise for the learner. However, it is not possible to perform PCA calculations in Excel without using VBA or an Add-in. In addition, a facility for attaching labels to points in a graph is useful.

A VBA macro to perform PCA is provided. The reader or instructor will need to edit the subroutine for appropriate applications. The code as supplied performs the following. The user is asked

448

CHEMOMETRICS

 

 

Figure A.24

Chart facility in Excel

Figure A.25

Placing the chart in a new sheet

APPENDICES

449

 

 

how many samples;

how many variables;

how many PCs are to be calculated;

whether the columns are to be centred or not.

The data must occupy a region of the spreadsheet starting at cell B2 so that the first row and column can be used for names of the objects or variables if necessary, or left blank. Of course, this limitation can easily be changed by instructors or readers wishing to modify the macro. A function size is also required. Each row must represent one sample, and each column a variable.

NIPALS, as described in Section A.2.1, is used to extract the components sequentially. The scores of each PC are printed out to the right of the data, one column per PC, and the loadings below the data as rows. The eigenvalues are calculated in the program and may be printed out if required, by adding statements, or simply calculated as the sum of squares of the scores of each PC.

The AddChartLabels subroutine allows labels to be added to points in a graph. First, produce a scatterplot in Excel using the Chart Wizard. Make sure that the column to the left of the first (‘x’) variable contains the names or labels you wish to attach to the chart; see Figure A.26(a). Then simply run the macro, and each point should be labelled as in Figure A.26(b). If you want to change the font size or colour, edit the program as appropriate. If some of the labels overlap after the macro has been run, for example if there are close points in a graph, you can select each label manually and move it around the graph, or even delete selective labels. This small segment of code can, of course, be incorporated into a more elaborate graphics package, but in this text we include a sample which is sufficient for many purposes.

A.4.6.2 Multivariate Analysis Add-in

Accompanying the text is also an Add-in to perform several methods for multivariate analysis. The reader is urged first to understand the methods by using matrix commands in Excel or editing macros, and several examples in this book guide the reader to setting up these methods from scratch. However, after doing this once, it is probably unnecessary to repeat the full calculations from scratch and convenient to have available Add-ins in Excel. Although the performance has been tested on computers of a variety of configurations, we recommend a minimum of Office 2000 and Windows 98, together with at least 64 Mbyte memory. There may be problems with lower configurations. The VBA software was written by Tom Thurston based on an original implementation from Les Erskine.

You need to download the Add-ins from the publisher’s Website. You will obtain a setup file, click this to obtain the screen in Figure A.27, and follow the instructions. If in doubt, contact whoever is responsible for maintaining computing facilities within your department or office. Note that sometimes there can be problems if you use networks, for example using NT, and under such circumstances you may be required to consult the systems manager. The setup program will install the Add-in (an XLA file) and support files on your computer.

Next, start Excel, and select ‘Tools’ then ‘Add-ins . . .’ from the menu. The ‘AddIns’ dialog box should now appear. If ‘Multivariate Analysis’ is not listed, then click

450

CHEMOMETRICS

 

 

3500

 

 

 

 

 

 

 

 

 

 

Fe

 

 

 

 

 

 

 

 

 

 

 

Be

 

 

 

 

 

 

 

 

 

 

 

Co

 

 

 

 

 

 

 

 

 

 

 

 

 

3000

 

 

 

 

 

 

 

 

Cu

 

Ni

 

 

 

 

 

 

 

 

 

 

 

 

 

2500

 

 

 

 

 

 

 

 

 

Mn

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2000

 

 

 

 

Pb

 

 

 

 

 

 

 

 

 

 

 

Bi

 

Sr Ca

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1500

 

 

 

Li

Tl

 

 

 

 

 

 

 

 

 

 

 

Mg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

K

Na

 

Zn

 

 

 

 

 

 

1000

 

 

 

 

 

 

 

 

 

 

 

 

Cs Rb

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

500

Xe

 

Br

I

 

 

 

 

 

 

 

 

 

Cl

 

 

 

 

 

 

 

 

 

Ne F

Kr

 

 

 

 

 

 

 

 

 

 

0

Ar

 

 

 

 

 

 

 

 

 

 

 

He

 

 

400

600

800

1000

1200

1400

1600

1800

2000

0

200

 

(b)

Figure A.26

Labelling a graph in Excel

APPENDICES

451

 

 

Figure A.27

Setup screen for the Excel chemometrics Add-in

Figure A.28

Multivariate Analysis dialog box

the ‘Browse . . .’ button and find an XLA file which would have been created during the setup procedure. Make sure the box next to ‘Multivariate Analysis’ is ticked, then click ‘OK’ to close the Add-Ins dialog box. You should now have a new item in the ‘Tools’ menu in Excel, titled ‘Multivariate Analysis’, which gives access to the various chemometric methods. Once selected, the dialog box in Figure A.28 should appear, allowing four options which will be described below.

The PCA dialog box is illustrated in Figure A.29. It is first necessary to select the data range and the number of PCs to be calculated. By default the objects are along

452

CHEMOMETRICS

 

 

Figure A.29

PCA dialog box

the rows and the variables down the columns, but it is possible to transpose the data matrix, in PCA and all other options, which is useful, for example, when handling large spectra with more the 256 wavelengths, remembering that there is a limit to the number of columns in an Excel worksheet; use the ‘Transpose data’ option in this case. The data may be mean centred in the direction of variables, or standardised (this uses the population rather than sample standard deviation, as recommended in this book).

It is possible to cross-validate the PCs using a ‘leave one sample out at a time’ approach (see Chapter 4, Section 4.3.3.2); this option is useful if one wants guidance as to how many PCs are relevant to the model. You are also asked to select the number of PCs required.

An output range must be chosen; it is only necessary to select the top left-hand cell of this range, but be careful that it does not overwrite existing data. For normal PCA, choose which of eigenvalues, scores and loadings you wish to display. If you select eigenvalues you will also be given the total sum of squares of the preprocessed (rather than raw) data together with the percentage variance of each eigenvalue.

Although cross-validation is always performed on the preprocessed data, the RSS and PRESS values are always calculated on the ‘x’ block in the original units, as discussed in Chapter 4, Section 4.3.3.2. The reason for this relates to rather complex problems that occur when standardising a column after one sample has been removed. There are, of course, many other possible approaches. When performing cross-validation, the only output available involves error analysis.

Соседние файлы в предмете Химия