## Face Recognition by Regularized Discriminant Analysis

Citations: | 5 - 0 self |

### BibTeX

@MISC{Dai_facerecognition,

author = {Dao-qing Dai and Pong C. Yuen},

title = {Face Recognition by Regularized Discriminant Analysis},

year = {}

}

### OpenURL

### Abstract

Abstract—When the feature dimension is larger than the number of samples the small sample-size problem occurs. There is great concern about it within the face recognition community. We point out that optimizing the Fisher index in linear discriminant analysis does not necessarily give the best performance for a face recognition system. We propose a new regularization scheme. The proposed method is evaluated using the Olivetti Research Laboratory database, the Yale database, and the Feret database. Index Terms—Face recognition, optimization, regularized discriminant analysis (RDA), small sample-size problem. I.

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...II. REGULARIZED DISCRIMINANT ANALYSIS (RDA) The regularization method was originally proposed by Tikhonov to solve the ill-posed operator equation; its application in machine learning is addressed in =-=[35]-=-. We employ the idea of regularization [5], [25], [28] but not restrict ourselves to small perturbation. Along this line, we propose a two-parameter regularization scheme which will be reported in Sec... |

2928 |
Introduction to Statistical Pattern Recognition, Electrical Science Series
- Fukunaga
- 1972
(Show Context)
Citation Context ...Identification of persons with automatic computer interfaces has aroused increasing interest in the computer science community in recent years [2], [16], [44]. Linear discriminant analysis (LDA) [8], =-=[9]-=-, [30] is a well-known and popular statistical method in pattern recognition and classification. Its basic idea is to optimize the Fisher discriminant index F [8], [9], [30] defined by voffset="pt" F ... |

1654 | Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
- Belhumeur, Hespanha, et al.
(Show Context)
Citation Context ...f the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′ <d,sothatinR d′ the within-class scatter matrix is not singular. The FisherFace =-=[1]-=-, [19], the most discriminant features [32], QR-decomposition [42], the subspace methods [31], [44] and recursive LDA [39] have been developed. These approaches are straightforward, but some of them s... |

1093 |
The use of multiple measurements in taxonomic problems
- Fisher
- 1936
(Show Context)
Citation Context ...TION Identification of persons with automatic computer interfaces has aroused increasing interest in the computer science community in recent years [2], [16], [44]. Linear discriminant analysis (LDA) =-=[8]-=-, [9], [30] is a well-known and popular statistical method in pattern recognition and classification. Its basic idea is to optimize the Fisher discriminant index F [8], [9], [30] defined by voffset="p... |

818 | The FERET Evaluation Methodology for Face-Recognition Algorithms
- Phillips, Moon, et al.
- 2000
(Show Context)
Citation Context ...in Cambridge University, U.K. The Yale face database contains 165 gray scale images of 15 individuals. This set has considerable variations in facial expressions and illuminations. The Feret database =-=[27]-=- is more challenging. We shall use 432 front-view images of 72 subjects, each having six images. Fig. 2 shows images of two subjects. The images are manually aligned according to the positions of eyes... |

588 | Human and machine recognition of faces: A survey - Chellappa, Wilson, et al. - 1995 |

432 | Using discriminant eigenfeatures for image retrieval
- Swets, Weng
- 1996
(Show Context)
Citation Context ...ulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′ <d,sothatinR d′ the within-class scatter matrix is not singular. The FisherFace [1], [19], the most discriminant features =-=[32]-=-, QR-decomposition [42], the subspace methods [31], [44] and recursive LDA [39] have been developed. These approaches are straightforward, but some of them suffer from two limitations. First, the Fish... |

336 | An Introduction to Biometric Recognition
- Jain, Ross, et al.
- 2004
(Show Context)
Citation Context ...s (RDA), small sample-size problem. I. INTRODUCTION Identification of persons with automatic computer interfaces has aroused increasing interest in the computer science community in recent years [2], =-=[16]-=-, [44]. Linear discriminant analysis (LDA) [8], [9], [30] is a well-known and popular statistical method in pattern recognition and classification. Its basic idea is to optimize the Fisher discriminan... |

198 | A direct lda algorithm for highdimensional data - with application to face recognition
- Yu, Yang
(Show Context)
Citation Context ...ral. Chen et al. [3] and Belhumeur et al. [1] further proposed to maximize tr(W T CbW ) in the null space of Cw. The final transformation matrix will lead the Fisher index to be infinite. Yu and Yang =-=[43]-=- pointed out that the vectors outside the null space still have discriminative power, and therefore they proposed direct LDA. Lu et al. [25] applied regularized LDA in the range of Cb; see also [40]. ... |

176 | Face recognition: A convolutional neural-network approach
- LAWRENCE, GILES, et al.
- 1997
(Show Context)
Citation Context ...The training images are different from one algorithm to another. Also, some algorithms only reported the best result. In our RDA system, we run the program 50 times. The convolution neural network in =-=[22]-=- achieved recognition rate 96.2% for one run. Using the pseudo-2-D hidden Markov models and discrete cosine transform (HMM+DCT) [6] the recognition rate reached 100% for one run. The recognition rate ... |

155 |
ªSmall Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners,º
- Raudys, Jain
- 1991
(Show Context)
Citation Context ...ectors of the matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], [20], [24], [28], =-=[29]-=-, [34], [36], [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′... |

79 | Face recognition using line edge map
- Gao, Leung
- 2002
(Show Context)
Citation Context ...ce method is 75.6%. If the first three eigenvectors are removed, it increases to 84.7%. The correlation method and linear subspace method are 76.1% and 79.4%, respectively. The FisherFace method [1], =-=[12]-=-, [41], which uses PCA for dimension reduction and then applies LDA, is 93.7%. The kernelFisher method [41] is 93.94%. The independent component analysis (ICA) is 70.91%, if a feature extraction proce... |

74 | KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition
- Yang, Frangi, et al.
- 2005
(Show Context)
Citation Context ...irst, the FisherFace method might fail [45]. Selections of features are important issues also [26], [37]. Second, feature vectors in the null space of Cw can still have discriminate power as shown in =-=[40]-=-. Another direction is to modify the optimization criteria. The Fisher index is a combination of two measures Cw and Cb. We need to maxiManuscript received December 9, 2005; revised June 14, 2006. Thi... |

66 | A Unified Framework for Subspace Face Recognition
- Wang, Tang
- 2004
(Show Context)
Citation Context ...ix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], [20], [24], [28], [29], [34], [36], =-=[38]-=-, i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′ <d,sothatinR d′ t... |

61 |
Recognition of JPEG compressed face images based on statistical methods”, Image Vision Comput
- Eickler, Wuller, et al.
- 2000
(Show Context)
Citation Context ...tem, we run the program 50 times. The convolution neural network in [22] achieved recognition rate 96.2% for one run. Using the pseudo-2-D hidden Markov models and discrete cosine transform (HMM+DCT) =-=[6]-=- the recognition rate reached 100% for one run. The recognition rate of the DCT-based system [14] is 91% for one run. The average recognition accuracy for direct LDA [43] for more than ten runs is 90.... |

53 |
Face recognition using the discrete cosine transform," International journal of computer vision
- Hafed, Levine
- 2001
(Show Context)
Citation Context ...te 96.2% for one run. Using the pseudo-2-D hidden Markov models and discrete cosine transform (HMM+DCT) [6] the recognition rate reached 100% for one run. The recognition rate of the DCT-based system =-=[14]-=- is 91% for one run. The average recognition accuracy for direct LDA [43] for more than ten runs is 90.8%. The uncorrelated discriminant feature [17] with one run is 97.5%. For the radial basis functi... |

47 |
Discriminant analysis with singular covariance matrices: Methods and application to spectroscopic data. Applied Statistics 44
- Krzanowski, Jonathan, et al.
- 1995
(Show Context)
Citation Context ...rmined from eigenvectors of the matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], =-=[20]-=-, [24], [28], [29], [34], [36], [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from ... |

47 |
Multivariate Statistical Inference and Applications
- Rencher
- 1998
(Show Context)
Citation Context ...ification of persons with automatic computer interfaces has aroused increasing interest in the computer science community in recent years [2], [16], [44]. Linear discriminant analysis (LDA) [8], [9], =-=[30]-=- is a well-known and popular statistical method in pattern recognition and classification. Its basic idea is to optimize the Fisher discriminant index F [8], [9], [30] defined by voffset="pt" F =max W... |

36 | Zhu,“Where Are Linear Feature Extraction Methods Applicable
- Martinez, M
(Show Context)
Citation Context ...have been developed. These approaches are straightforward, but some of them suffer from two limitations. First, the FisherFace method might fail [45]. Selections of features are important issues also =-=[26]-=-, [37]. Second, feature vectors in the null space of Cw can still have discriminate power as shown in [40]. Another direction is to modify the optimization criteria. The Fisher index is a combination ... |

34 | An optimization criterion for generalized discriminant analysis on undersampled problems
- Ye, Janardan, et al.
(Show Context)
Citation Context ...ployed to reduce the dimension from d to d ′ ,whered ′ <d,sothatinR d′ the within-class scatter matrix is not singular. The FisherFace [1], [19], the most discriminant features [32], QR-decomposition =-=[42]-=-, the subspace methods [31], [44] and recursive LDA [39] have been developed. These approaches are straightforward, but some of them suffer from two limitations. First, the FisherFace method might fai... |

32 |
Face Recognition with Radial Basis Function (RBF) Neural Networks
- Er, Wu, et al.
- 2002
(Show Context)
Citation Context ...e recognition accuracy for direct LDA [43] for more than ten runs is 90.8%. The uncorrelated discriminant feature [17] with one run is 97.5%. For the radial basis function and neural network (RBF+NN) =-=[7]-=-, the average recognition rate of six runs is 98.08%. With a fuzzy hybrid learning algorithm [13], the neural network system reaches 99.55% with four runs. With two training images, the uncorrelated o... |

32 |
Face recognition based on the uncorrelated discriminant transformation
- Jin, Yang, et al.
- 2001
(Show Context)
Citation Context ... run. The recognition rate of the DCT-based system [14] is 91% for one run. The average recognition accuracy for direct LDA [43] for more than ten runs is 90.8%. The uncorrelated discriminant feature =-=[17]-=- with one run is 97.5%. For the radial basis function and neural network (RBF+NN) [7], the average recognition rate of six runs is 98.08%. With a fuzzy hybrid learning algorithm [13], the neural netwo... |

31 |
Optimal discriminant plane for a small number of samples and design method of classifier on the plane, " Pattern recognition
- hong, Yang
- 1991
(Show Context)
Citation Context ...on information, and the data tend to be oversmoothed. Moreover, it should be noted that when the small sample-size problem occurs, Inf-F index can always be obtained in a null-space approach. In [3], =-=[10]-=-, and [23], this idea was used. D. Posterior Error Rate and the Optimal Parameters The purpose of this section is to estimate the two parameters α and β. We proceed in the following steps. 1) Large Fi... |

31 |
A.N.Venetsanopoulos, “Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition
- Lu
- 2005
(Show Context)
Citation Context ...trix will lead the Fisher index to be infinite. Yu and Yang [43] pointed out that the vectors outside the null space still have discriminative power, and therefore they proposed direct LDA. Lu et al. =-=[25]-=- applied regularized LDA in the range of Cb; see also [40]. We find out that when small sample-size problem occurs, optimizing the Fisher index F does not necessarily lead to the best system performan... |

30 |
Algebraic Feature Extraction for Image Recognition Based on an Optimal Discrimination Criterion
- Liu, Yang
(Show Context)
Citation Context ...ion rates are not on the α-axis (small β). mize the between-class scatter matrix and to minimize the within-class scatter matrix. Substitutions of these two scatters are very natural [42]. Liu et al. =-=[23]-=- used the total scatter matrix Ct to replace the withinclass covariance Cw. The rank of the matrix Ct is, in general, greater than that of the matrix Cw. But it can still be singular. Hence, they soug... |

27 | Eigenspace-based face recognition: a comparative study of different approaches
- Ruiz-del-Solar, Navarrete
- 2005
(Show Context)
Citation Context ...ion from d to d ′ ,whered ′ <d,sothatinR d′ the within-class scatter matrix is not singular. The FisherFace [1], [19], the most discriminant features [32], QR-decomposition [42], the subspace methods =-=[31]-=-, [44] and recursive LDA [39] have been developed. These approaches are straightforward, but some of them suffer from two limitations. First, the FisherFace method might fail [45]. Selections of featu... |

26 |
Improving kernel Fisher discriminant analysis for face recognition
- Liu, Lu, et al.
- 2004
(Show Context)
Citation Context ... from eigenvectors of the matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], [20], =-=[24]-=-, [28], [29], [34], [36], [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d... |

22 | Face Recognition Using Kernel Methods
- Yang
- 2002
(Show Context)
Citation Context ...hod is 75.6%. If the first three eigenvectors are removed, it increases to 84.7%. The correlation method and linear subspace method are 76.1% and 79.4%, respectively. The FisherFace method [1], [12], =-=[41]-=-, which uses PCA for dimension reduction and then applies LDA, is 93.7%. The kernelFisher method [41] is 93.94%. The independent component analysis (ICA) is 70.91%, if a feature extraction process is ... |

21 |
Image Classification by the Foley-Sammon Transform
- Tian, Barbero, et al.
- 1986
(Show Context)
Citation Context ... Tikhonov regularization (T-Reg) [44], the parameter (α, β) is around the origin. The regularization matrix R is R =(Cw + ɛId×d) −1 = g� k=1 1 λk + ɛ uku T k + 1 ɛ d� k=g+1 uku T k . 2) Pseudoinverse =-=[33]-=-, the parameter (α, β) is on the β-axis. The regularization matrix is R = �g k=1 (1/λk)ukuT k . This scheme discards completely the information in the null space of the covariance matrix Cw, which are... |

18 |
et al., “Face recognition: A literature survey”, UMD CfAR
- Zhao, Chellappa, et al.
- 2000
(Show Context)
Citation Context ...), small sample-size problem. I. INTRODUCTION Identification of persons with automatic computer interfaces has aroused increasing interest in the computer science community in recent years [2], [16], =-=[44]-=-. Linear discriminant analysis (LDA) [8], [9], [30] is a well-known and popular statistical method in pattern recognition and classification. Its basic idea is to optimize the Fisher discriminant inde... |

13 |
Selecting discriminant eigenfaces for face recognition
- Wang, Plataniotis, et al.
- 2005
(Show Context)
Citation Context ...een developed. These approaches are straightforward, but some of them suffer from two limitations. First, the FisherFace method might fail [45]. Selections of features are important issues also [26], =-=[37]-=-. Second, feature vectors in the null space of Cw can still have discriminate power as shown in [40]. Another direction is to modify the optimization criteria. The Fisher index is a combination of two... |

10 |
Solving the small sample size problem in face recognition using generalized discriminant analysis, Pattern Recognition 39
- Howland, Wang, et al.
- 2006
(Show Context)
Citation Context ...s determined from eigenvectors of the matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem =-=[11]-=-, [20], [24], [28], [29], [34], [36], [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension... |

9 |
A new lda based face recognition system which can solve the small sample size problem
- Chen, Liao, et al.
- 2000
(Show Context)
Citation Context ...still be singular. Hence, they sought transform matrix W such that tr(W T CbW ) �= 0 under the constraint tr(W T CtW )=0. Solutions of this optimization problem are not unique in general. Chen et al. =-=[3]-=- and Belhumeur et al. [1] further proposed to maximize tr(W T CbW ) in the null space of Cw. The final transformation matrix will lead the Fisher index to be infinite. Yu and Yang [43] pointed out tha... |

8 |
Regularized discriminant analysis and its application to face recognition
- Dai, Yuen
- 2003
(Show Context)
Citation Context ... The regularization method was originally proposed by Tikhonov to solve the ill-posed operator equation; its application in machine learning is addressed in [35]. We employ the idea of regularization =-=[5]-=-, [25], [28] but not restrict ourselves to small perturbation. Along this line, we propose a two-parameter regularization scheme which will be reported in Section II-B. Moreover, from Fig. 1, we find ... |

8 | Feitosa. "A New Covariance Estimate for Bayesian Classifiers in Biometric Recognition
- Thomaz, Gillies, et al.
- 2004
(Show Context)
Citation Context ... of the matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], [20], [24], [28], [29], =-=[34]-=-, [36], [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′ <d,so... |

8 |
Face recognition using recursive Fisher linear discriminant
- Xiang, Fan, et al.
(Show Context)
Citation Context ...d,sothatinR d′ the within-class scatter matrix is not singular. The FisherFace [1], [19], the most discriminant features [32], QR-decomposition [42], the subspace methods [31], [44] and recursive LDA =-=[39]-=- have been developed. These approaches are straightforward, but some of them suffer from two limitations. First, the FisherFace method might fail [45]. Selections of features are important issues also... |

7 |
Solving the small sample size problem of
- Huang, Liu, et al.
- 2002
(Show Context)
Citation Context ... the recognition rate 74.94% and 85.45%, respectively. Our system RDA is 97.6%. With the Feret database: Fig. 6 is a comparision of RDA with FisherFace [1], Direct LDA [43], and Huang et al.’s method =-=[15]-=-. The data for the last three methods are extracted from [15] directly, where six images of 70 persons were used. When the number of training images are 2 and 3, RDA has the best performance. The over... |

7 |
Inverse Fisher discriminant criteria for small sample size problem and its application to face recognition
- Zhuang, Dai
(Show Context)
Citation Context ...the subspace methods [31], [44] and recursive LDA [39] have been developed. These approaches are straightforward, but some of them suffer from two limitations. First, the FisherFace method might fail =-=[45]-=-. Selections of features are important issues also [26], [37]. Second, feature vectors in the null space of Cw can still have discriminate power as shown in [40]. Another direction is to modify the op... |

6 |
Face Recognition Using Feature Extraction Based on Independent Component Analysis
- Kwak, Choi, et al.
(Show Context)
Citation Context ...ion reduction and then applies LDA, is 93.7%. The kernelFisher method [41] is 93.94%. The independent component analysis (ICA) is 70.91%, if a feature extraction process is applied the ICA’s (ICA-FX) =-=[21]-=- rate increases to 96.36%. The edge map method and the line edge map method (LEM) [12] give the recognition rate 74.94% and 85.45%, respectively. Our system RDA is 97.6%. With the Feret database: Fig.... |

5 |
Kernel Machine-Based One-Parameter Regularized Fisher Discriminant Method for Face Recognition
- Chen, Yuen, et al.
- 2005
(Show Context)
Citation Context ...f the matrix C −1 w Cb, we solve the eigenproblem (RCb)Wrda = WrdaDrda where Drda = Drda(α, β) is a diagonal matrix with nonnegative entries, Wrda = Wrda(α, β) is the regularized Fisher transform. In =-=[4]-=-, a one-parameter scheme with kernel is developed, as we can see from Fig. 1 that the optimal domain cannot be described by one parameter. In [28], a perturbation of quadratic discriminant analysis is... |

4 |
Regularized discriminant analysis for face recognition
- Pima, Aladjem
(Show Context)
Citation Context ...eigenvectors of the matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], [20], [24], =-=[28]-=-, [29], [34], [36], [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,wh... |

3 |
An improved LDA approach
- Jing, Zhang, et al.
(Show Context)
Citation Context ... feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′ <d,sothatinR d′ the within-class scatter matrix is not singular. The FisherFace [1], =-=[19]-=-, the most discriminant features [32], QR-decomposition [42], the subspace methods [31], [44] and recursive LDA [39] have been developed. These approaches are straightforward, but some of them suffer ... |

3 | Bilinear discriminant analysis for face recognition
- Visani, Garcia, et al.
- 2005
(Show Context)
Citation Context ...e matrix C −1 w Cb. Many algorithms based on LDA have been employed in face recognition technology. However, it suffers from a well-known small sample-size problem [11], [20], [24], [28], [29], [34], =-=[36]-=-, [38], i.e., the number of samples is small compared with the size of the feature vector. To avoid this difficulty, techniques are employed to reduce the dimension from d to d ′ ,whered ′ <d,sothatin... |

2 |
A fuzzy hybrid learning algorithm for radial basis function neural network with application in human face recognition
- Haddadniaa, Faeza, et al.
- 2003
(Show Context)
Citation Context ...criminant feature [17] with one run is 97.5%. For the radial basis function and neural network (RBF+NN) [7], the average recognition rate of six runs is 98.08%. With a fuzzy hybrid learning algorithm =-=[13]-=-, the neural network system reaches 99.55% with four runs. With two training images, the uncorrelated optimal discrimination vectors method is 81.25% and the improved LDA Fig. 6. Comparison of RDA wit... |

2 | UODV: Improved algorithm and generalized theory - Jing, Zhang, et al. - 2003 |