Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

8 3D Face Recognition

343

Fig. 8.8 A two-class classification problem in which we wish to reduce the data dimension to one. The standard PCA result is given as the black axis passing through the pooled data mean, and the LDA result is given by the green axis

8.8 LDA-Based 3D Face Recognition

One reason that PCA-based approaches have been popular is that they can operate with only one training example per subject. This is because it does not take account of the per-subject (within-class) distribution of the training data. However, because of this reason, the projection axes computed by PCA may make class discrimination difficult. Indeed, in the worst case for some surface feature type, it could be that the very dimensions that are discarded by PCA are those that provide good discrimination between classes. With the advent of more sophisticated datasets with several (preferably many) 3D scans per subject, more sophisticated subspaces can be used, which attempt to find the linear combination of features that best separates each subject (class). This is the aim of Linear Discriminant Analysis (LDA), while simultaneously performing dimension reduction. Thus, while PCA finds the most expressive linear combinations of surface feature map dimensions (in the simplest case, depth map pixels), LDA finds the most discriminative linear combinations.

Although 3D face recognition is an inherently multi-class classification problem in a high dimensional space, it is easier to initially look at LDA for a two-class problem in a two-dimensional space and compare it to PCA. Subsequently we will look at the issues involved with high dimensional feature vectors and we will also generalize to multi-class problems.

8.8.1 Two-Class LDA

Suppose that we have the two-class, 2D problem shown in Fig. 8.8. Intuitively we want to project the data onto a direction for which there is the largest separation of the class means, relative to the within-class scatter in that same projection direction.

2
K(K1)

344

A. Mian and N. Pears

A scatter matrix is simply a scaled version of a covariance matrix, and for each set of training scans, Cc , belonging to class c {1, 2}, they are formed as:

Sc = X0Tc X0c ,

(8.26)

where X0c is a zero-mean data matrix, as described in Sect. 8.7.1 (although it is now class-specific), and nc is the number of training scans in the set Cc . We note that these scatter matrices are often expressed as a sum of outer products:

Sc

nc

xc )(xi

xc)T ,

xi

 

Cc ,

(8.27)

(xi

 

 

=

− ¯

− ¯

 

 

 

i=1

where x¯ c is the mean of the feature vectors in class Cc . Given that we have two classes, the within-class scatter matrix can be formed as:

SW = S1 + S2.

(8.28)

The between-class scatter is formed as the outer product of the difference between the two class means:

SB = (x¯ 1 x¯2)(x¯ 1 x¯ 2)T .

(8.29)

Fisher proposed to maximize the ratio of between class scatter to within class scatter relative to the projection direction [28], i.e. solve

J (w) =

max

wT SB w

(8.30)

 

w wT SW w

 

with respect to the 2 × 1 column vector w. This is known as Fisher’s criterion. A solution to this optimization can be found by differentiating Eq. (8.30) with respect to w and equating to zero. This gives:

S1S w

J w

=

0.

(8.31)

W B

 

 

 

We recognize this as a generalized eigenvalue-eigenvector problem, where the eigenvector of SW1SB associated with its largest eigenvalue (J ) is our desired optimal direction, w . (In fact, the other eigenvalue will be zero because the between class scatter matrix, being a simple outer product, can only have rank 1.) Figure 8.8 shows the very different results of applying PCA to give the most expressive axis for the pooled data (both classes), and applying LDA which gives a near orthogonal axis, relative to that of PCA, for the best class separation in terms of Fisher’s criterion. Once data is projected into the new space, one can use various classifiers and distance metrics, as described earlier.

8.8.2 LDA with More than Two Classes

In order to extend the approach to a multiclass problem (K > 2 classes), we could train pairwise binary classifiers and classify a test 3D face according to the class that gets most votes. This has the advantage of finding the projection that

8 3D Face Recognition

345

best separates each pair of distributions, but often results in a very large number of classifiers in 3D face recognition problems, due to the large number of classes (one per subject) often experienced. Alternatively one can train K one-versus-all classifiers where the binary classifiers are of the form subject X and not subject X. Although this results in fewer classifiers, the computed projections are usually less discriminative.

Rather than applying multiple binary classifiers to a multi-class problem, we can generalize the binary LDA case to multiple classes. This requires the assumption that the number of classes is less than or equal to the dimension of the feature space (i.e. K m where m is the dimension of our depth map space or surface feature map space). For high dimensional feature vectors, such as encountered in 3D face recognition, we also require a very large body of training data to prevent the scatter matrices from being singular. In general, the required number of 3D face scans is not available, but we address this point in the following subsection.

The multi-class LDA procedure is as follows: Firstly, we form the means of each class (i.e. each subject in the 3D face dataset). Using these means, we can compute the scatter matrix for each class using Eqs. (8.26) or (8.27). For the within-class scatter matrix, we simply sum the scatter matrices for each individual class:

K

 

SW = Si .

(8.32)

i=1

 

This is an m × m matrix, where m is the dimension of the feature vector. We form the mean of all training faces, which is the weighted mean of the class means:

K

x¯ = 1 ni x¯ i , (8.33) n i=1

where ni is the number of training face scans in each class and n is the total number of training scans. The between-class scatter matrix is then formed as:

SB

 

K

x)(xi

x)T .

(8.34)

=

ni (xi

 

¯

− ¯ ¯

− ¯

 

i=1

This scatter matrix is also m × m. Rather than finding a single m-dimensional vector to project onto, we now seek a reduced dimension subspace in which to project our data, so that a feature vector in the new subspace is given as:

 

 

x˜ = WT x.

 

(8.35)

We formulate Fisher’s criterion as:

 

 

 

J (W)

=

max

|WT SB W|

,

(8.36)

|WT SW W|

 

W

 

 

where the vertical lines indicate that the determinant is to be computed. Given that the determinant is equivalent to the product of the eigenvalues, it is a measure of the square of the scattering volume. The projection matrix, W, maps the original

346 A. Mian and N. Pears

m-dimensional space to, at most, K 1 dimensions and so its maximum size is m × (K 1). This limit exists because SB is the sum of K outer products and matrices formed from outer products are rank 1 or less. Additionally, only K 1 of these are independent due to their dependence on the overall mean x¯ of the pooled training data. As with the two class case, the optimal projection is found from the generalized eigenvalue-eigenvector problem:

S1S W

J W

=

0,

(8.37)

W B

 

 

 

and the optimal projection W is the one whose columns are the eigenvectors corresponding to the largest eigenvalues. At most, there will be K 1 non-zero eigenvalues. In practise we select the subset of eigenvectors with eigenvalues above some small threshold, as those close to zero provide little useful discrimination. Note that we can only form the solution in this way if SW is non-singular and we discuss this issue in the following subsection.

8.8.3 LDA in High Dimensional 3D Face Spaces

When applying LDA to 3D face recognition, we have to address the problems associated with working in high dimensional spaces. In [45], for example, range maps of 60 pixels wide and 90 high are employed to give feature vectors of length m = 5400. Thus the scatter matrices, both SW and SB are extremely large (dimension m × m) and worse still, due to the typically small number (10–100) of training images per class, they are singular (non invertible). This is referred to as LDA’s small sample size problem. The more traditional approach to dealing with this is to initially apply PCA as a dimensional reduction stage, before LDA can proceed, as was done in Belhumeur et al.’s [8] seminal work in 2D face recognition and later in 3D face recognition work [39, 43, 45].

If the total number of pooled training vectors is n, then the rank of SW is at most k = n K . Thus PCA can be applied to the data and the k eigenvectors with the largest eigenvalues are selected to give the m × k projection matrix Vk . This is then reduced to K 1 dimensions using the LDA derived k × (K 1) projection matrix, as described in the previous subsection.

Assuming that 3D faces can be represented as an m-dimensional vector, we can summarize this approach with the following training steps:

1.Determine the projection matrix of a PCA-derived subspace, of dimension n K , as described in Sect. 8.7.

2.Project all of the 3D faces that constitute the training data into this subspace.

3.Form the within-class and between-class scatter matrices of the training data in this reduced dimension PCA-derived subspace, as described in Sect. 8.8.2.

4.Determine the smaller (maximum dimension K 1) LDA-derived subspace of this PCA subspace by solving the generalized eigenvalue-eigenvector problem of the form of Eq. (8.37).

5.Project the training data from the PCA subspace into the LDA subspace.