## Face recognition by independent component analysis (2002)

### Cached

### Download Links

Venue: | IEEE Transactions on Neural Networks |

Citations: | 220 - 4 self |

### BibTeX

@ARTICLE{Bartlett02facerecognition,

author = {Marian Stewart Bartlett and Javier R. Movellan and Terrence J. Sejnowski},

title = {Face recognition by independent component analysis},

journal = {IEEE Transactions on Neural Networks},

year = {2002},

pages = {1450--1464}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the high-order relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance. Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA), unsupervised learning. I.

### Citations

3000 | Eigenfaces for recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...ndencies will still show in the joint distribution of PCA coefficients, and, thus, will not be properly separated. Some of the most successful representations for face recognition, such as eigenfaces =-=[57]-=-, holons [15], and local feature analysis [50] are based on PCA. In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image ... |

1436 |
Independent component analysis, a new concept
- Comon
- 1994
(Show Context)
Citation Context ... is important to investigate whether generalizations of PCA which are sensitive to high-order relationships, not just second-order relationships are advantageous. Independent component analysis (ICA) =-=[14]-=- is one such generalization. A number of algorithms for performing ICA have been proposed. See [29, 20] for reviews. Here we employ an algorithm developed by Bell and Sejnowski [11, 12] from the point... |

1167 | An information-maximization approach to blind separation and blind deconvolution
- Sejnowski
- 1995
(Show Context)
Citation Context ...alysis (ICA) [14] is one such generalization. A number of algorithms for performing ICA have been proposed. See [20] and [29] for reviews. Here, we employ an algorithm developed by Bell and Sejnowski =-=[11]-=-, [12] from the point of view of optimal information transfer in neural networks with sigmoidal transfer functions. This algorithm has proven successful for separating randomly mixed auditory signals ... |

1083 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...ere not sparse, while Architecture II produced sparse face codes, but with holistic basis images. A representation that has recently appeared in the literature, nonnegative matrix factorization (NMF) =-=[28]-=-, produced local basis images and sparse face codes. 9 While this representation is interesting from a theoretical perspective, it has not yet proven useful for recognition. Another innovative face re... |

991 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- Field, Olshausen
- 1996
(Show Context)
Citation Context ...tatistically independent basis images, ƒ, where e is an unknown mixing matrix. The basis images were estimated as the learned ICA output …. Fig. 5. Image synthesis model for Architecture II, based =-=on [43] and-=- [44]. Each image in the dataset was considered to be a linear combination of underlying basis images in the matrix e. The basis images were each associated with a set of independent “causes,” giv... |

670 | View-based and modular eigenspace for face recognition
- Pentland, Moghaddam, et al.
- 1994
(Show Context)
Citation Context ...e FERET database [52]. Face recognition performances using the ICA representations were benchmarked by comparing them to performances using PCA, which is equivalent to the “eigenfaces” representat=-=ion [51]-=-, [57]. The two ICA representations were then combined in a single classifier. II. ICA There are a number of algorithms for performing ICA [11], [13], [14], [25]. We chose the infomax algorithm propos... |

655 | Relations between the statistics of natural images and the response properties of cortical cells
- Field
- 1987
(Show Context)
Citation Context ...lationship between sparseness and independence [5], [12]. Conversely, it has also been shown that Gabor filters, which closely model the responses of V1 simple cells, separate high-order dependencies =-=[18]-=-, [19], [54]. (See [6] for a more detailed discussion). In support of the relationship between Gabor filters and ICA, the Gabor and ICA Architecture I representations significantly outperformed more t... |

533 | A new learning algorithm for blind signal separation
- Amari, Cichocki, et al.
- 1996
(Show Context)
Citation Context ... of the entropy in matrix form, i.e., the cell in row , column of this matrix is the derivative of with respect to . Computation of the matrix inverse can be avoided by employing the natural gradient =-=[1]-=-, which amounts to multiplying the absolute gradient by , resulting in the following learning rule [12]: where is the identity matrix. The logistic transfer function (1) gives . When there are multipl... |

403 |
What is the goal of sensory coding
- Field
- 1994
(Show Context)
Citation Context ... eigenvectors of the covariance matrix of the data. Second-order statistics capture the amplitude spectrum of images but not their phase spectrum. The high order statistics capture the phase spectrum =-=[19, 12]-=-. For a given sample of natural images we can scramble their phase spectrum while maintaining their power spectrum. This will dramatically alter the appearance of the images but will not change their ... |

397 |
The FERET database and evaluation procedure for face recognition algorithms, image and vision computing 16 (5
- Phillips, Wechsler, et al.
- 1998
(Show Context)
Citation Context ... as random variables and the images as outcomes. 1 Matlab code for the ICA representations is available at http://inc.ucsd.edu/~marni. Face recognition performance was tested using the FERET database =-=[52]. Fa-=-ce recognition performances using the ICA representations were benchmarked by comparing them to performances using PCA, which is equivalent to the “eigenfaces” representation [51], [57]. The two I... |

339 | Kernel independent component analysis
- Bach, Jordan
(Show Context)
Citation Context ...ogress has been made by assuming a linear mixing process followed by parametric nonlinear functions [31], [59]. An algorithm for nonlinear ICA based on kernel methods has also recently been presented =-=[4]-=-. Kernel methods have already shown to improve face recognition performances1462 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 with PCA and Fisherfaces [60]. Another future direc... |

291 |
Normalization of cell responses in cat striate cortex
- Heeger
- 1992
(Show Context)
Citation Context ...to length-normalizing the vectors prior to measuring Euclidean distance when doing nearest neighbor Thus, if (13) Such contrast normalization is consistent with neural models of primary visual cortex =-=[23]-=-. Cosine similarity measures were previously found to be effective for computational models of language [24] and face processing [46]. Fig. 10 gives face recognition performance with both the ICA and ... |

284 | Learning overcomplete representations
- Lewicki, Sejnowski
- 2000
(Show Context)
Citation Context ... perception of facial similarity than both PCA and nonnegative matrix factorization [22]. Desirable filters may be those that are adapted to the patterns of interest and capture interesting structure =-=[33]-=-. The more the dependencies that are encoded, the more structure that is learned. Information theory provides a means for capturing interesting structure. Information maximization leads to an efficien... |

276 | Classifying Facial Action
- Bartlett, Viola, et al.
- 1996
(Show Context)
Citation Context ...on. The LFA kernels are not sensitive to the high-order dependencies in the face image ensemble, and in tests to date, recognition performance with LFA kernels has not significantly improved upon PCA =-=[16]-=-. Interestingly, downsampling methods based on sequential information maximization significantly improve performance with LFA [49]. ICA outputs using Architecture I were sparse in space (within image ... |

248 |
Could information theory provide an ecological theory of sensory processing
- Atick
- 1992
(Show Context)
Citation Context ...ally independent, i.e., a factorial face code. Barlow and Atick have discussed advantages of factorial codes for encoding complex objects that are characterized by high-order combinations of features =-=[2]-=-, [5]. These include fact that the probability of any combination of features can be obtained from their marginal probabilities. To achieve this goal, we organize the data matrix so that rows represen... |

241 | Local Feature Analysis: A general statistical theory for object representation
- Penev, Atick
- 1996
(Show Context)
Citation Context ...tion of PCA coefficients, and, thus, will not be properly separated. Some of the most successful representations for face recognition, such as eigenfaces [57], holons [15], and local feature analysis =-=[50]-=- are based on PCA. In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels, and thus, it is important to investiga... |

240 |
Unsupervised learning
- Barlow
- 1989
(Show Context)
Citation Context ...analysis (PCA), unsupervised learning. I. INTRODUCTION REDUNDANCY in the sensory input contains structural information about the environment. Barlow has argued that such redundancy provides knowledge =-=[5]-=- and that the role of the sensory system is to develop factorial representations in which these dependencies are separated into independent components Manuscript received May 21, 2001; revised May 8, ... |

230 | Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources
- Lee, Girolami, et al.
- 1999
(Show Context)
Citation Context ...d, ICA can be seen as doing something akin to nonorthogonal PCA and to cluster analysis, however, when the source models are sub-Gaussian, the relationship between these techniques is less clear. See =-=[30]-=- for a discussion of ICA in the context of sub-Gaussian sources. B. Two Architectures for Performing ICA on Images Let be a data matrix with rows and columns. We can think of each column of as outcome... |

205 | Independent component analysis of electroencephalographic data
- Makeig, Bell, et al.
- 1996
(Show Context)
Citation Context ...h sigmoidal transfer functions. This algorithm has proven successful for separating randomly mixed auditory signals (the cocktail party problem), and for separating electroencephalogram (EEG) signals =-=[37]-=- and functional magnetic resonance imaging (fMRI) signals [39]. We performed ICA on the image set under two architectures. Architecture I treated the images as random variables and the pixels as outco... |

174 |
The importance of phase in signals
- Oppenheim, Lim
- 1981
(Show Context)
Citation Context ...drives human perception. For example, as illustrated in Fig. 1, a face image synthesized from the amplitude spectrum of face A and the phase spectrum of face B will be perceived as an image of face B =-=[45]-=-, [53]. The fact that PCA is only sensitive to the power spectrum of images suggests that it might not be particularly well suited for representing natural images. The assumption of Gaussian sources i... |

173 |
What does the retina know about natural scenes
- Atick, Redlich
- 1992
(Show Context)
Citation Context ...ntageous for encoding complex objects that are characterized by high-order dependencies. Atick and Redlich have also argued for such representations as a general coding strategy for the visual system =-=[3]-=-. Principal component analysis (PCA) is a popular unsupervised statistical method to find useful image representations. Consider a set of basis images each of which has pixels. A standard basis set co... |

154 |
Independent Component Analysis: Theory and Applications
- Lee
- 1998
(Show Context)
Citation Context ...not just second-order relationships, are advantageous. Independent component analysis (ICA) [14] is one such generalization. A number of algorithms for performing ICA have been proposed. See [20] and =-=[29]-=- for reviews. Here, we employ an algorithm developed by Bell and Sejnowski [11], [12] from the point of view of optimal information transfer in neural networks with sigmoidal transfer functions. This ... |

149 | Nonlinear neurons in the low-noise limit - a factorial code maximizes information-transfer
- Nadal, Parga
- 1994
(Show Context)
Citation Context ...of the underlying ICs (up to scaling and translation) it can be shown that maximizing the joint entropy of the outputs in also minimizes the mutual information between the individual outputs in [12], =-=[42]-=-. In practice, the logistic transfer function has been found sufficient to separate mixtures of natural signals with sparse distributions including sound sources [11]. The algorithm is speeded up by i... |

143 |
Independent component analysis—a new concept? Signal Process
- Comon
- 1994
(Show Context)
Citation Context ...is important to investigate whether generalizations of PCA which are sensitive to high-order relationships, not just second-order relationships, are advantageous. Independent component analysis (ICA) =-=[14]-=- is one such generalization. A number of algorithms for performing ICA have been proposed. See [20] and [29] for reviews. Here, we employ an algorithm developed by Bell and Sejnowski [11], [12] from t... |

141 |
Lesioning an attractor network: Investigations of acquired dyslexia
- Hinton, Shallice
- 1991
(Show Context)
Citation Context ... (13) Such contrast normalization is consistent with neural models of primary visual cortex [23]. Cosine similarity measures were previously found to be effective for computational models of language =-=[24]-=- and face processing [46]. Fig. 10 gives face recognition performance with both the ICA and the PCA-based representations. Recognition performance is also shown for the PCA-based representation using ... |

141 | Statistical Models for Images: Compression, Restoration and Synthesis
- Simoncelli
- 1997
(Show Context)
Citation Context ...etween sparseness and independence [5], [12]. Conversely, it has also been shown that Gabor filters, which closely model the responses of V1 simple cells, separate high-order dependencies [18], [19], =-=[54]-=-. (See [6] for a more detailed discussion). In support of the relationship between Gabor filters and ICA, the Gabor and ICA Architecture I representations significantly outperformed more than eight ot... |

137 |
A simple coding procedure enhances a neuron’s information capacity. Zeitschrift fur Naturforschung
- Laughlin
- 1981
(Show Context)
Citation Context ...[13], [14], [25]. We chose the infomax algorithm proposed by Bell and Sejnowski [11], which was derived from the principle of optimal information transfer in neurons with sigmoidal transfer functions =-=[27]-=-. The algorithm is motivated as follows: Let be an -dimensional ( -D) random vector representing a distribution of inputs in the environment. (Here, boldface capitals denote random variables, whereas ... |

123 | Probabilistic framework for the adaptation and comparison of image codes
- Lewicki, Olshausen
- 1999
(Show Context)
Citation Context ...here PCA uses Gaussian sources, and ICA typically uses sparse sources. It has been shown that for many natural signals, ICA is a better model in that it assigns higher likelihood to the data than PCA =-=[32]-=-. The ICA basis dimensions presented here may have captured more likelihood of the face images than PCA, which provides a possible explanation for the superior performance of ICA for face recognition ... |

111 | Maximum likelihood and covariant algorithms for independent component analysis
- MacKay
- 1996
(Show Context)
Citation Context ...es are set to zero and the variances are equalized. When the inputs to ICA are the “sphered” data, the full transform matrix is the product of the sphering matrix and the matrix learned by ICA Mac=-=Kay [36]-=- and Pearlmutter [48] showed that the ICA algorithm converges to the maximum likelihood estimate of for the following generative model of the data: where is a vector of independent random variables, c... |

110 | Independent component representations for face recognition
- Bartlett, Lades, et al.
- 1998
(Show Context)
Citation Context ... and outputs, maximizing the joint entropy of the output encourages the individual outputs to move toward statistical independence. When the form 1 Preliminary versions of this work appear in [7] and =-=[9]-=-. A longer discussion of unsupervised learning for face recognition appears in [6]. (1) (2) (3) of the nonlinear transfer function is the same as the cumulative density functions of the underlying ICs... |

100 | A context-sensitive generalization of ICA
- Pearlmutter, Parra
- 1996
(Show Context)
Citation Context ...d the variances are equalized. When the inputs to ICA are the “sphered” data, the full transform matrix is the product of the sphering matrix and the matrix learned by ICA MacKay [36] and Pearlmut=-=ter [48]-=- showed that the ICA algorithm converges to the maximum likelihood estimate of for the following generative model of the data: where is a vector of independent random variables, called the sources, wi... |

92 | On associative memory
- Palm
- 1980
(Show Context)
Citation Context ...parse coding for face representations is storage in associative memory systems. Networks with sparse inputs can store more memories and provide more effective retrieval with partial information [10], =-=[47]-=-. The probability densities for the values of the coefficients of the two ICA representations and the PCA representation are shown in Fig. 17. The sparseness of the face representations were examined ... |

82 |
Robust learning algorithm for blind separation of signals
- Cichocki, Unbehauen, et al.
- 1994
(Show Context)
Citation Context ...h is equivalent to the “eigenfaces” representation [51], [57]. The two ICA representations were then combined in a single classifier. II. ICA There are a number of algorithms for performing ICA [1=-=1], [13]-=-, [14], [25]. We chose the infomax algorithm proposed by Bell and Sejnowski [11], which was derived from the principle of optimal information transfer in neurons with sigmoidal transfer functions [27]... |

81 | Face image analysis by unsupervised learning and redundancy reduction”,Ph.D. Thesis at
- Bartlett
- 1998
(Show Context)
Citation Context ...seness and independence [5], [12]. Conversely, it has also been shown that Gabor filters, which closely model the responses of V1 simple cells, separate high-order dependencies [18], [19], [54]. (See =-=[6]-=- for a more detailed discussion). In support of the relationship between Gabor filters and ICA, the Gabor and ICA Architecture I representations significantly outperformed more than eight other image ... |

63 | Ensemble learning
- Lappalainen, Miskin
- 2000
(Show Context)
Citation Context ...s have been proposed for separating sources on projection planes without discarding any ICs of the data [55]. Techniques for estimating the number of ICs in a dataset have also recently been proposed =-=[26], [4-=-0]. The information maximization algorithm employed to perform ICA in this paper assumed that the underlying “causes” of the pixel gray-levels in face images had a super-Gaussian (peaky) response ... |

57 |
Blind separation of sources I. An adaptive algorithm based on neuromimetic architecture
- Jutten, Herault
- 1991
(Show Context)
Citation Context ...ent to the “eigenfaces” representation [51], [57]. The two ICA representations were then combined in a single classifier. II. ICA There are a number of algorithms for performing ICA [11], [13], [1=-=4], [25]-=-. We chose the infomax algorithm proposed by Bell and Sejnowski [11], which was derived from the principle of optimal information transfer in neurons with sigmoidal transfer functions [27]. The algori... |

56 |
aspects of face recognition and the other-race effect
- O’Toole, Deffenbacher, et al.
- 1994
(Show Context)
Citation Context ...lization is consistent with neural models of primary visual cortex [23]. Cosine similarity measures were previously found to be effective for computational models of language [24] and face processing =-=[46]-=-. Fig. 10 gives face recognition performance with both the ICA and the PCA-based representations. Recognition performance is also shown for the PCA-based representation using the first 20 PC vectors, ... |

48 | H.,"Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition
- Liu, Wechsler
(Show Context)
Citation Context ...would be interactions between the type of enhancement and the representation. A number of research groups have independently tested the ICA representations presented here and in [9]. Liu and Wechsler =-=[35]-=-, and Yuen and Lai [61] both supported our findings that ICA outperformed PCA. Moghaddam [41] employed Euclidean distance as the similarity measure instead of cosines. Consistent with our findings, th... |

45 | Principal manifolds and Bayesian subspaces for visual recognition
- Moghaddam
- 1999
(Show Context)
Citation Context ...parison in this paper was to examine ICA and PCA-based representations under identical conditions. A number of methods have been presented for enhancing recognition performance with eigenfaces (e.g., =-=[41]-=- and [51]). ICA representations can be used in place of eigenfaces in these techniques. It is an open question as to whether these techniques would enhance performance with PCA and ICA equally, or whe... |

44 | Chromatic structure of natural scenes
- Wachtler, Lee, et al.
- 2001
(Show Context)
Citation Context ...ing interesting structure. Information maximization leads to an efficient code of the environment, resulting in more learned structure. Such mechanisms predict neural codes in both vision [12], [43], =-=[58]-=- and audition [32]. The research presented here found that face representations in which high-order dependencies are separated into individual coefficients gave superior recognition performance to rep... |

35 |
A Deformable Model for Face Recognition Under Arbitrary Lighting Conditions
- Hallinan
- 1995
(Show Context)
Citation Context ...his work also assumed that the pixel values in face images were generated from a linear mixing process. This linear approximation has been shown to hold true for the effect of lighting on face images =-=[21]-=-. Other influences, such as changes in pose and expression may be linearly approximated only to a limited extent. Nonlinear ICA in the absence of prior constraints is an ill-conditioned problem, but s... |

31 | Blind source separation of nonlinear mixing model
- Lee, Koehler, et al.
- 1997
(Show Context)
Citation Context ...extent. Nonlinear ICA in the absence of prior constraints is an ill-conditioned problem, but some progress has been made by assuming a linear mixing process followed by parametric nonlinear functions =-=[31]-=-, [59]. An algorithm for nonlinear ICA based on kernel methods has also recently been presented [4]. Kernel methods have already shown to improve face recognition performances1462 IEEE TRANSACTIONS ON... |

30 | Rate-coded restricted boltzmann machines for face recognition
- Teh, Hinton
- 2001
(Show Context)
Citation Context ...ther innovative face representation employs products of experts in restricted Boltzmann machines (RBMs). This representation also finds local features when nonnegative weight constraints are employed =-=[56]-=-. In experiments to date, RBMs outperformed PCA for recognizing faces across changes in expression or addition/removal of glasses, but performed more poorly for recognizing faces across different days... |

28 |
A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase
- Piotrowski, Campbell
- 1982
(Show Context)
Citation Context ... human perception. For example, as illustrated in Fig. 1, a face image synthesized from the amplitude spectrum of face A and the phase spectrum of face B will be perceived as an image of face B [45], =-=[53]-=-. The fact that PCA is only sensitive to the power spectrum of images suggests that it might not be particularly well suited for representing natural images. The assumption of Gaussian sources implici... |

25 |
Emergence of simple-cell receptive properties by learning sparse code for natural images. Nature
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...ski [12] that image bases that produce independent outputs from natural scenes are local, oriented, spatially opponentslters similar to the response properties of V1 simple cells. Olshausen and Field =-=[44, 43]-=- obtained a similar result with a sparseness objective, where there is a close information theoretic relationship between sparseness and independence [5, 12]. Conversely, it has also been shown that G... |

22 | Face Recognition Using Kernel Methods
- Yang
- 2002
(Show Context)
Citation Context ...ecently been presented [4]. Kernel methods have already shown to improve face recognition performances1462 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 with PCA and Fisherfaces =-=[60]-=-. Another future direction of this research is to examine nonlinear ICA representations of faces. Unlike PCA, the ICA using Architecture I found a spatially local face representation. Local feature an... |

21 |
gender and emotion recognition using Holons
- Cottrell, Metcalfe, et al.
(Show Context)
Citation Context ... still show in the joint distribution of PCA coefficients, and, thus, will not be properly separated. Some of the most successful representations for face recognition, such as eigenfaces [57], holons =-=[15]-=-, and local feature analysis [50] are based on PCA. In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels, and t... |

19 |
The 'Independent components' of natural scenes are edge Vision Research
- Bell, Sejnowski
(Show Context)
Citation Context ...t the value taken by pixel i based on the corresponding value taken by pixel j on the same image. This approach was inspired by Bell & Sejnowski's work on the independent components of natural images =-=[12]-=-. Architecture I Architecture II Face 1 Face 2 Face n ... Image Pixel i Sources of Pixel i w11 w12 w1n ... ... U Pixel 1 Pixel 2 Pixel n ... Face i Sources of Face i w11 w12 w1n ... ... U Image 2 Imag... |

16 | Image representations for facial expression coding
- Bartlett, Donato, et al.
- 2000
(Show Context)
Citation Context ...abor and ICA Architecture I representations significantly outperformed more than eight other image representations on a task of facial expression recognition, and performed equally well to each other =-=[8]-=-, [16]. There is also psychophysical support for the relevance of independence to face representations in the brain. The ICA Architecture I representation gave better correspondence with human percept... |

15 |
Recognizing Faces with PCA
- Draper, Baek, et al.
- 2003
(Show Context)
Citation Context ... ICA using Euclidean distance as the similarity measure. Cosines were not tested in that paper. A thorough comparison of ICA and PCA using a large set of similarity measures was recently conducted in =-=[17]-=-, and supported the advantage of ICA for face recognition. In Section V, ICA provided a set of statistically independent coefficients for coding the images. It has been argued that such a factorial co... |