Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Стратегии детекции в рампознавании лиц / Detection Strategies For Face Recognition Using Learning And Evolution Phd Dissertation 1998

.pdf
Скачиваний:
17
Добавлен:
01.05.2014
Размер:
1.05 Mб
Скачать

! "$# %'&)(.- * + 1'2 34/576 8 9$>@?$A/BDC EK F G H I

LNMPORQSTVUXWSMZY[MVU \ ]_^RTNO `

a MPbRTN[cdeMPOZc fNY gjZkkmM nM f [`PM opTNqf O rMP[qZlcU

tmTNl[YTZu v lOPlw opTVU/x yzyP{

A dissertation submitted in partial fulfillment of the requirements for the degree of Ph.D of Information Technology at George Mason University

|'} ~/ / @ p

I would like to thank my dissertation advisor, Dr. Harry Weschler, for his support and encouragement over the past five years. I benefited from his teaching of computer vision. More importantly, however, by his example, he taught me that to become a successful scientist one needed both an aggressive attitude and patience to conduct research.

I would also like to thank Professor Kenneth De Jong, Professor Anne Baraniecki, and Professor Edward Wegman, who were my committee members, as well as other faculty for their patient instruction throughout my academic study at George Mason University.

Finally, I thank Dr. Jonathon Phillips at NIST, Dr. Jerzy Bala, Dr. Abraham Schultz of the Navy Research Laboratory, and my closest colleague, Dr. Srinivas Gutta for the brilliant ideas and interesting topics they contributed to my work. Their insights and advice enabled me to break through problems and certainly resulted in timely completion of this dissertation.

My research was partially sponsored by the Department of Defense Counterdrug Technology Development Program, with the U. S. Army Research Laboratory as Technical Agent, under contracts DAAL01-93-K-0099, DAAL01-94-R-9094 and DAAL01-97-K-0118.

2

i$' ' p p

Abstract

 

Chapter 1

Introduction

1.1Biometrics

1.2Face Recognition

1.3Literature Review

1.4Thesis Outline Chapter 2 Machine Intelligence

2.1BehaviorBased AI and Artificial Life

2.2Active and Selective Perception

2.3Visual Routine Processor

Chapter 3

Adaptation

3.1Learning

3.2Decision Trees

3.3Evolutionary Computation and Genetic Algorithms

3.4Learning and Evolution: Lamarck vs. Baldwin Chapter 4 Face Detection

4.1Background

4.2Methodology

4.3Performance Evaluation and the FERET Data Base

4.4Face Detection Using Still Imagery

4.4.1Face Location

4.4.2Face Cropping

4.4.3Post processing

4.4.4Experiments

4.5Face Detection Using Color Images

4.6Face Detection Using Video Sequences

4.6.1Tracking and Detection of Humans Using Optical Flow

4.6.2Experiments

Chapter 5 Facial Landmark Classification

5.1Base Representations Using Optimal Wavelet Packets

5.1.1Eye Representation

5.1.2Classification Using Radial Basis Function (RBFs)

5.2Feature Selection Using Genetic Algorithms

3

5.2.1Genetic Algorithm (GA) – Decision Trees (DT) Hybrid Learning

5.2.2Experiments

Chapter 6 Face Exploration for Eyes Detection

6.1Animats

6.2Methodology

6.2.1Navigation and Homing

6.2.2Feature Selection and Pattern Classification

6.3Experiments.

6.3.1Exhaustive Search

6.3.2Navigation

6.3.2.1Saliency Map

6.3.2.2Feature Selection and Pattern Classification

Chapter 7

Conclusions

References

4

$

This thesis is concerned with Automated Face Recognition (AFR), which is a major challenge for applications related to biometrics, telecommunications, human-computer interactions, and medicine. The thesis describes novel strategies for both face and eye detection. Face detection is important because it restricts the field of view and thus reduces the amount of computation, while eye detection is important because it enables face normalization and leads to size invariant face recognition. The novel strategies for both face and eye detection are adaptive, are based on learning and evolution, and are characteristic of Behavior-Based AI and Active and Selective Vision. The feasibility of our novel methodology for detection tasks related to face recognition has been proved using FERET, which is a large and standard face image data base.

The contributions of this thesis are twofold. First, we have introduced an adaptive methodology for face detection tasks that should carry over to the more general area of behavior-based AI and artificial life. Furthermore, we have investigated the interactions between learning and evolution and have advanced a hybrid approach where learning supports evolution by providing the fitness function. Second, we proved the feasibility of our approach in support of a real and very important technological challenge, that of face recognition using both still ('photography') and time-varying imagery ('video'). The robustness of our face detection approach applies to both grayscale and color images and has been proved using a large data base consisting of 2,340 face images drawn from the FERET data base. The algorithm is able to decide first if a face is present, and if the face is present it crops ('box') the face. Using grayscale imagery the performance on face and eye detection tasks yields an accuracy of 96% and 90%, respectively, and as the approach does not require multiple (scale) face templates the system displays thus scale invariance for face detection. Eye detection can be approached using an exhaustive search or one can consider the possibility of navigating the facial landscape in search of the eyes. Towards that end we have evolved optimal navigational skills taking the form of Finite State Automata (FSA). Using such an approach, on a limited data set consisting of 20 images, we have achieved 95% accuracy. This approach is relevant because it reduces the search space and it can become relevant for navigation robotics and also for speech recognition systems as FSA are quite similar in their structure to the sought after Hidden Markov Models (HMM).

5

CHAPTER

1

@¡£¢)¤¦¥D§©¨ª¡£

Biometrics are defined as the capture and use of physiological or behavioral characteristics for personal identification and / or individual verification purposes. Biometrics are widely used for forensics, access control, time and attendance record, and banking and communication security. Face recognition is a natural and straightforward biometric method that human beings use to identify each other. Humans are able to detect and identify faces in a scene with little or no effort. Building automated systems to accomplish this task, however, is very difficult due to the significant variability encountered during the image formation process.

To design an Automated Face Recognition (AFR) system, one needs to address several related problems:

(i)detection of an image pattern as a subject and then as a face against either uniform or complex background;

(ii)detection of facial landmarks for normalizing the face images to account for geometrical and illumination changes; and

(iii)identification and verification of face images using appropriate classification algorithms and possibly

post processing the results using model-based schemes and logistic feedback.

Face recognition is a difficult task mostly because of the inherent variability of the image formation process in terms of image quality and photometry, geometry, occlusion, change, and disguise (Samal and Iyengar, 1992; Chellappa, etc., 1995). All AFR systems available today can only perform on restricted data bases of images in terms of size, age, gender, and/or race, and the AFR further assume well controlled environments. There are additional degrees of variability ranging from those assuming that the position/cropping of the face and its environment (distance and illumination) are totally controlled, to those involving little or no control over the background and viewpoint, and finally to those allowing for major changes in facial appearance due to factors such as aging and disguise (hat and/or glasses).

An AFR system would find countless applications, e.g. criminal identification and retrieval of missing children, workstation and building security, credit card verification, and video-document retrieval. As an example one could start to address tasks ranging from tagging video frames characterized by specific facial landmarks - like specific faces wearing glasses - to retrieving all the frames where the same person shows up.

­­®°¯ ³µ´¶h·R±¸z¹

Biometrics, the science of using individual personal characteristics to verify or recover identity, is set to become the successor to the Personal Identification Number. “ The term biometrics refers to a range of authentication systems.

6

It’s definition is: a measurable physical characteristic or personal trait used to recognize the identity, or verify the claimed identity, of a person through automated means" (Stephen Cobb, 1996). Biometrics represent the most secure way to identify individuals because instead of verifying identity and granting access based on the possession or knowledge of cards, passwords, or tokens, verifying an identity is established (i.e. access is granted) using a physical and unique biometric characteristic. Passwords or PINs used alone are responsible for fraud on corporate computer networks and the Internet because they can be guessed or stolen. Plastic cards, smart cards or computer token cards used alone are also not secure because they can be forged, stolen or lost, or become corrupted or unreadable. One can lose his card or forget a password, but he/she cannot lose or forget the fingers, eyes, or face.

The technique of using biometric methods for identification can be widely applied to forensics, ATM banking, communication, time and attendance, and access control. Biometric technologies include:

º Face Recognition

» Finger Print (dactylogram) Identification

¼ Hand Geometry Identification

½ Iris Identification

¾ Voice Recognition

¿ Signature Recognition

À Retina Identification

Á DNA Sequence Matching

Among these methods, there are multiple benefits to face recognition over other biometric methods. While the other biometrics require some voluntary action, face recognition can be used passively. This has advantages for both ease of use and for covert use such as police surveillance. Face images also allow easy audits and verification performed by human operators when logging biometrics records. Regarding data acquisition, it is also easier to acquire good face images than good fingerprints. It turns out that about 5% of all people can not provide a good enough fingerprint for a reader to use for verification. The reasons include cut skin, bandaged finger, callused finger, dry skin, dry humidity, diseased skin, old skin, oriental skin, narrow finger and smudged sensor on reader. Similar disadvantage caused by the damage of epidermis tissue happens to hand geometry identification too. Using fingerprint scanners or palm readers can also transmit germs through a hand rest. In contrast, a face recognition system is totally hygienic and requires no maintenance because the face is measured from a distance.

Iris scans can provide very high accuracy rates for person identification. However, because the iris is so small, it takes two expensive camera motion drives with high resolution to find the iris. As the camera view has to be narrow to capture the resolution of the iris, the whole process is highly sensitive to body motion and as a consequence one has to be somewhat steady in order not to get rejected. Retina readers sense the retinal vein patterns in the back of ones eye. This requires an individual to look into an eyepiece while some light is being reflected off the back of the eye to capture the vein patterns. Although retina scanning yields very high accurate identification rates, most people would still resist having intrusive measurement inside their eyes. Both iris and retina scanning lack in failing to identify people who wear vanity contact lens which cover the iris and retina or people blinking while their picture is being taken. Glare from glasses can also prevent the scanners from finding the iris or the retina. In contrast, an automated face recognition system only requires either one or two inexpensive cameras and the camera(s) do not need to move because they capture a large enough field of vision to cover the

7

range of people's heights whether they are standing or sitting. A good face recognition algorithm works even with some glare reflected from the glasses or with the eyes closed.

Voice recognition for surveillance purposes suffers also as it is not reliable in noisy environments like public places or across phone lines with variable acoustic properties. The voice recognition systems are also sensitive to hoarse throat conditions when people are sick with colds. A tape recording of the correct person's voice can fool voice recognition systems that do not have a challenge-response process. The signature is used for legally binding documents, but it usually turns out that people vary their signatures greatly from time to time and from mood to mood. There are also concerns of pen and reading surfaces wearing poorly over time. This reduces the reliability of signature identification systems.

Face recognition is thus easier to be operated indoors and outdoors by detecting and cropping the area containing suspicious face pattern from complex background (Sung and Poggio, 1994). One can also consider the possibility to combine different biometric techniques with face recognition in order to build mult-modal person authentication systems. As an example of personal authentication, Bigun (1998) presents one surveillance system consisting of speech and face recognition which yields better performance than either voice or face recognition individual performances.

ÂÄ°ÅRÆÃ Ç ÈpÉÍ _Î

An Automated Face Recognition (AFR) system can be utilized in several different application domains. These domains impact many aspects of human life. In the industry, the AFR is applicable to photo-security systems, ATM banking building access, and telecommunication workstation access. In government, the AFR system can meet the needs in immigration control, border control, full-time monitoring, and airport/seaport security. The AFR can improve criminal identification for forensic purpose and counter-terrorism techniques. This is of importance to the intelligence agencies and police departments. Defense requirements, such military troop entrance control, battlefield monitoring, and military personnel authentication, are applicable domains for this technique. In medicine, the AFR can be useful in studies of the autonomic nervous system, the psychological reaction of patient, and intensive care monitoring by detecting and analyzing facial expressions (Arad, et al, 1994).

Researchers in computer vision and pattern recognition have worked on automatic techniques for recognizing human faces for the last two decades (Chellappa, et. al. 1995). Humans can detect and identify faces in a scene with little or no effort (Bruce, 1998). This skill is quite robust, despite large changes in the visual stimulus due to viewing conditions, expression, aging, and distractions such as glasses or changes in hair style. Understanding the mechanism of human vision for recognizing faces and further building automated systems that accomplish this task under significant variability in the image formation process is, however, very difficult.

There are several related (face recognition) subproblems: (i) detection of a pattern as a face (in the crowd), (ii) detection of facial landmarks, (iii) identification of the faces, and (iv) analysis of facial expressions (Samal et al., 1992). Face recognition starts with the detection of face patterns (Rowley et al, 1995) in sometimes cluttered scenes, proceeds by normalizing the face images to account for geometrical and illumination changes, possibly using information about the location and appearance of facial landmarks, identifies the faces using appropriate classification algorithms, and post processes the results using model-based schemes and logistic feedback.

8

Automated face recognition, however, requires computer systems to look through many stored sets of characteristics ('the gallery') and pick the one that matches best those features of the unknown individual ('the probe'). In most practical scenarios there are two possible recognition tasks to be considered (DePersia and Phillips, 1995):

MATCH: An image of an unknown individual is collected (©probe©) and the identity is found searching a large set of images (©gallery©). Matching becomes especially difficult when the probe is a duplicate rather than the same (counterpart) image from the gallery. The duplicate image involves variability due to both the image acquisition process and to changes in physical appearance. Robust matching should allow for the possibility that there is no match for the probe in the existing gallery.

SURVEILLANCE: Rather than identifying a person, the system is now involved with verification and checks if a given probe belongs to a relatively small gallery, sometimes labeled as a set of intruders. The probe can range from individual faces to known faces displaying specific characteristics. The surveillance system is usually flooded with a large set of face images (e.g. video frame retrieval and/or airport security) and most of the faces, if not all of them, correspond to false positives. Individual VERIFICATION is a particulat case for surveillance when the both the gallery and the probe consist of just one image.

ÑÓ°ÔhÕzÖV×hØÙ/ÚÒ Û ÜRÝ Þäßpàå

The basic question relevant for face classification is what form should the structural code (for encoding the face) take to achieve face recognition. Two major approaches are used for automated identification of human faces. The first approach, the abstractive one, extracts (and measures) discrete local features ©indexes© for retrieving and identifying faces, while standard statistical pattern recognition techniques are then employed for matching faces using these measurements. The other approach, the holistic one, conceptually related to template matching, attempts to identify faces using global representations. Characteristics of this approach are connectionist methods such as backpropagation (©holons©), principal component analysis (PCA), and singular value decomposition (SVD) using eigenfaces (Turk and Pentland, 1991). Note also that both the abstractive and holistic approaches require first the early detection of facial landmarks for feature measurements and normalization, respectively. This detection stage involves attention mechanisms similar to those used by the human visual system (HVS) to screen out the visual field and to focus on salient input characteristics. As an example, the dynamic and multiresolution (DMA) scheme introduced by Takacs and Wechsler (1995) is mostly concerned with the aspects involved in selecting (information loaded) fixation points and the early detection of salient facial landmarks needed for the (geometrical) normalization stage.

Brunelli and Poggio (1993) suggest that the optimal strategy for face recognition is holistic and corresponds to template matching. Although recognition by matching raw images has been successful under limited circumstances (Baron, 1981), it suffers from the usual shortcomings of straightforward correlation-based approaches, such as sensitivity to face orientation, size, variable lighting conditions, and noise. The reason for this vulnerability of direct matching methods lies in their attempt to carry out the required classification in a space of extremely high dimensionality. To overcome the curse of dimensionality, the connectionist equivalent of data compression methods is employed first. It has been successfully argued, however, that the resulting principal component (feature) dimensions do not necessarily retain the structure needed for classification, and that more general and powerful

9

methods for feature extraction such as projection pursuit are required (Huber, 1981; Phillips, 1994). The basic idea behind projection pursuit is to pick "interesting" low dimensional projections of a high dimensional point cloud, by maximizing an objective function such as the deviation from normality. In other words, one should seek features not only in terms of large variance but whose probability density function (pdf) is multi-modal.

Sirovich and Kirby (1987) were first to apply PCA for representing face images. They showed that any particular face can be economically represented along the eigenpictures coordinate space, and that any face can be approximately reconstructed by using just a small collection of eigenpictures and the corresponding projections ('coefficients') along each eigenpicture. Since eigenpictures are good in representing face images, one can also consider to use the projections along them as classification features to recognize faces. As for face recognition accurate reconstruction of the image is not a requirement, a smaller subset of the eigenpictures should be sufficient. Turk and Pentland (1991) then developed a face recognition method, known as eigenfaces, which corresponds to the eigenvectors associated with the dominant eigenvalues of the face (patterns) covariance matrix. The eigenfaces define a feature space that drastically reduces the dimensionality of the original space, and face detection and identification are carried out in this small space. Further research has revealed that the leading principal components (PCs) can be effectively used for recognition only when the variations of within and between classes have the same dominant directions. If this is not the case, other PCs corresponding to smaller eigenvalues may be more useful for recognition (Jolliffe, 1986). Swets and Weng (1996) have pointed out recently that the eigenfaces derived using PCA are only the most expressive features (MEF), which are unrelated to actual face recognition. To derive most discriminating features (MDF), one needs a subsequent discriminant analysis projection. Their procedure, similar to Linear Discriminant Analysis (LDA) involves the simultaneous diagonalization of the two withinand betweenclass scatter matrices (Fukunaga, 1991).

Buhmann, Lades, and Malsburg (1992; 1993) suggest the Dynamic Link Architecture (DLA) for face recognition. DLA starts by first extracting local (feature) information and then performs matching using global information describing the geometry of connecting significant local features. The Gabor and/or wavelet defined local (feature) representations, labeled as jets, are augmented by their intrinsic global spatial structure. Face recognition is the result of graph matching employing optimization techniques based on diffusion processes. The corresponding cost (fitness) function consists of two terms, one measuring the resemblance between the jets corresponding to an unknown face and those describing face candidates from the data base, while the other term compares the corresponding spatial structures linking the jets. The generation of the object graphs, the initialization of the spatial image graphs to be matched against the face models, and the matching process itself are semi-automatic. Three major extensions to this system are made in order to handle larger galleries and large variations in pose, and to increase the matching accuracy (Wiskott, et al., 1997). The first extension is using the phase of the complex Gabor wavelet coefficients to achieve a more accurate location of the nodes and to disambiguate patterns. The second extension employs face adapted graph models called fiducial points so that the nodes now refer to specific facial landmarks while the correspondences between two faces is found across large viewpoint changes. The third extension introduce a new data structure, called the bunch graph, which serves as a generalized representation of faces by combining jets of a small set of individual faces. Using this architecture, the recognition rate can reach 98% for first rank and 99% for the first 10 ranks using a gallery of 250 individuals.

10