## An optimization criterion for generalized discriminant analysis on undersampled problems (2004)

### Cached

### Download Links

- [www.public.asu.edu]
- [www-users.cs.umn.edu]
- [www.public.asu.edu]
- [www-users.cs.umn.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Trans. Pattern Analysis and Machine Intelligence |

Citations: | 28 - 8 self |

### BibTeX

@ARTICLE{Ye04anoptimization,

author = {Jieping Ye and Ravi Janardan and Senior Member and Cheong Hee Park and Haesun Park},

title = {An optimization criterion for generalized discriminant analysis on undersampled problems},

journal = {IEEE Trans. Pattern Analysis and Machine Intelligence},

year = {2004},

volume = {26},

pages = {982--994}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative sizes of the data dimension and sample size, overcoming a limitation of classical LDA. The optimization problem can be solved analytically by applying the Generalized Singular Value Decomposition (GSVD) technique. The pseudoinverse has been suggested and used for undersampled problems in the past, where the data dimension exceeds the number of data points. The criterion proposed in this paper provides a theoretical justification for this procedure. An approximation algorithm for the GSVD-based approach is also presented. It reduces the computational complexity by finding subclusters of each cluster and uses their centroids to capture the structure of each cluster. This reduced problem yields much smaller matrices to which the GSVD can be applied efficiently. Experiments on text data, with up to 7,000 dimensions, show that the approximation algorithm produces results that are close to those produced by the exact algorithm. Index Terms—Classification, clustering, dimension reduction, generalized singular value decomposition, linear discriminant analysis, text mining. 1

### Citations

3122 |
Introduction to modern Information Retrieval
- Salton, McGill
(Show Context)
Citation Context ...a, where the dimension of the data points is higher than the number of data points. Such high-dimensional, undersampled problems occur frequently in many applications, including information retrieval =-=[25]-=-, facial recognition [3], [28], and microarray analysis [1]. The application area of interest in this paper is vector space-based information retrieval. The dimension of the document vectors is typica... |

2718 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...ment vectors, which best preserves the structure of the original document vectors. 1.1 Prior Related Work Latent Semantic Indexing (LSI) has been widely used for dimension reduction of text data [2], =-=[5]-=-. It is based on lower rank approximation of the term-document matrix from the Singular Value Decomposition (SVD) [11]. Although the SVD provides the optimal reduced rank approximation of the matrix w... |

2649 |
Introduction to Statistical Pattern Recognition”, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ... difficult to determine theoretically. The Linear Discriminant Analysis (LDA) method has been applied for decades for dimension reduction (feature extraction) of clustered data in pattern recognition =-=[10]-=-. It is classically formulated as an optimization problem on scatter matrices. A serious disadvantage of the LDA is that its objective function requires that the total scatter matrix be nonsingular. I... |

2147 |
Dubes: "Algorithms for Clustering data
- Jain, Richard
- 2005
(Show Context)
Citation Context ..., the input documents are assumed to have been already clustered before the dimension reduction step. When the documents are not clustered, then an efficient clustering algorithm such as K-Means [6], =-=[16]-=- can be applied before the dimension reduction step. We seek a reduced representation of the document vectors, which best preserves the structure of the original document vectors. 1.1 Prior Related Wo... |

1921 | Pattern Classification - Duda, Hart, et al. - 2001 |

1212 |
Automatic Text Processing. The Transformation, Analysis and Retrieval of Informotion by Computer
- Salton
- 1989
(Show Context)
Citation Context ...; the dimension of the document set is 2,887. Data Set 3 has five clusters, each with 98 elements; the dimension of the document set is 3,759. For all the examples, we use the tf-idf weighting scheme =-=[24]-=-, [32] for encoding the document collection with a termdocument matrix. 5.2 Experimental Methodology To evaluate the proposed methods in this paper, we compared them with the other three dimension red... |

533 | Using linear algebra for intelligent information retrieval
- Berry, Dumais, et al.
- 1995
(Show Context)
Citation Context ... document vectors, which best preserves the structure of the original document vectors. 1.1 Prior Related Work Latent Semantic Indexing (LSI) has been widely used for dimension reduction of text data =-=[2]-=-, [5]. It is based on lower rank approximation of the term-document matrix from the Singular Value Decomposition (SVD) [11]. Although the SVD provides the optimal reduced rank approximation of the mat... |

392 | Using discriminant eigenfeatures for image retrieval
- Swets, Weng
- 1996
(Show Context)
Citation Context ...data points is higher than the number of data points. Such high-dimensional, undersampled problems occur frequently in many applications, including information retrieval [25], facial recognition [3], =-=[28]-=-, and microarray analysis [1]. The application area of interest in this paper is vector space-based information retrieval. The dimension of the document vectors is typically very high, due to the larg... |

303 | Concept decompositions for large sparse text data using clustering
- Dhillon, Modha
- 2001
(Show Context)
Citation Context ...paper, the input documents are assumed to have been already clustered before the dimension reduction step. When the documents are not clustered, then an efficient clustering algorithm such as K-Means =-=[6]-=-, [16] can be applied before the dimension reduction step. We seek a reduced representation of the document vectors, which best preserves the structure of the original document vectors. 1.1 Prior Rela... |

303 | Regularized discriminant analysis
- Friedman
- 1989
(Show Context)
Citation Context ...e some useful information. Some relevant references on this approach can be found in [15], [18]. The second approach solves the singularity problem by adding a perturbation to the scatter matrix [4], =-=[9]-=-, [18], [33]. The algorithm is known as the regularized LDA, or RLDA for short. A limitation of RLDA is that the amount of perturbation to be used is difficult to determine [4], [18]. The final approa... |

212 |
An algorithm for suffix stripping ». Program
- Porter
- 1980
(Show Context)
Citation Context ...xperiments, we use three different data sets, 1 summarized in Table 3. For all data sets, we use a stoplist to remove common words, and the words are stemmed using Porter’s suffix-stripping algorithm =-=[22]-=-. Moreover, any term that occurs in fewer than two documents was eliminated as in [32]. Data Set 1 is derived from the TREC-5, TREC-6, and TREC-7 collections [30]. It consists of 210 documents in a sp... |

156 |
Reuters-21578 text categorization test collection, Distribution 1.0, README file (v 1.3
- Lewis
- 2004
(Show Context)
Citation Context ...nsists of 210 documents in a space of dimension of 7,454, with 7 clusters. Each cluster has 30 documents. Data Sets 2 and 3 are from Reuters-21578 text categorization test collection Distribution 1.0 =-=[19]-=-. Data Set 2 contains four clusters, with each containing 80 elements; the dimension of the document set is 2,887. Data Set 3 has five clusters, each with 98 elements; the dimension of the document se... |

145 |
Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika
- Gower
- 1966
(Show Context)
Citation Context ...etermine [4], [18]. The final approach applies the pseudoinverse to avoid the singularity problem, which is equivalent to approximating the solution using a least-squares solution method. The work in =-=[12]-=- showed that its use led to appropriate calculation of Mahalanobis distance in singular cases. The use of classical LDA with the pseudoinverse has been proposed in the past. The Pseudo Fisher Linear D... |

122 |
Information retrieval systems: theory and implementation
- Kowalski
- 1997
(Show Context)
Citation Context ...AðaijÞ, its ði; jÞth term, aij, represents the weighted frequency of term i in document j. Several weighting schemes have been developed for encoding the document collection in a termdocument matrix =-=[17]-=-. An advantage of the vector space based-method is that once the collection of documents is represented as columns of the term-document matrix in a high-dimensional space, the algebraic structure of t... |

92 |
Towards a generalized singular value decomposition
- Paige, Saunders
- 1981
(Show Context)
Citation Context ...onstraints on the optimization problem are based on the observations in Section 3.2. 3.1 Generalized Singular Value Decomposition The Generalized Singular Value Decomposition (GSVD) was introduced in =-=[21]-=-, [31]. A simple algorithm to compute GSVD can be found in [13], where the algorithm is based on [21].s986 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 8, AUGUST 2004 T... |

72 |
Small sample learning during multimedia retrieval using biasmap
- Zhou, Huang
- 2001
(Show Context)
Citation Context ...ul information. Some relevant references on this approach can be found in [15], [18]. The second approach solves the singularity problem by adding a perturbation to the scatter matrix [4], [9], [18], =-=[33]-=-. The algorithm is known as the regularized LDA, or RLDA for short. A limitation of RLDA is that the amount of perturbation to be used is difficult to determine [4], [18]. The final approach applies t... |

55 | Generalizing discriminant analysis using the generalized singular value decomposition
- Howland, Park
- 2003
(Show Context)
Citation Context .... The first approach applies an intermediate dimension reduction method, such as LSI, Principal Component Analysis (PCA), and Partial Least Squares (PLS), to extract important components for LDA [3], =-=[14]-=-, [18], [28]. The algorithm is named two-stage LDA. This approach is straightforward but the intermediate dimension reduction stage may remove some useful information. Some relevant references on this... |

55 |
Generalizing the singular value decomposition
- Loan
- 1976
(Show Context)
Citation Context ...ints on the optimization problem are based on the observations in Section 3.2. 3.1 Generalized Singular Value Decomposition The Generalized Singular Value Decomposition (GSVD) was introduced in [21], =-=[31]-=-. A simple algorithm to compute GSVD can be found in [13], where the algorithm is based on [21].s986 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 8, AUGUST 2004 The GSV... |

51 |
DNA Micro arrays and Gene expression from Experiments to Data Analysis and Modelling
- Baldi, Hatfield
- 2002
(Show Context)
Citation Context ... number of data points. Such high-dimensional, undersampled problems occur frequently in many applications, including information retrieval [25], facial recognition [3], [28], and microarray analysis =-=[1]-=-. The application area of interest in this paper is vector space-based information retrieval. The dimension of the document vectors is typically very high, due to the large number of terms that appear... |

50 | Eigenfaces vs. Fisherfaces: Recognition Using Class Specifin Linear Projection - Belhumeour, Hespanha, et al. - 1997 |

44 |
Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data
- Krzanowski, Jonathan, et al.
- 1995
(Show Context)
Citation Context ...cietysYE ET AL.: AN OPTIMIZATION CRITERION FOR GENERALIZED DISCRIMINANT ANALYSIS ON UNDERSAMPLED PROBLEMS 983 the number of data points. This is known as the undersampled or singularity problem [10], =-=[18]-=-. In recent years, many approaches have been brought to bear on such high-dimensional, undersampled problems. These methods can be roughly grouped into three categories. The first approach applies an ... |

42 |
Expected classification error of the fisher linear classifier with pseudo-inverse covariance matrix
- Raudys, Duin
- 1998
(Show Context)
Citation Context ...riate calculation of Mahalanobis distance in singular cases. The use of classical LDA with the pseudoinverse has been proposed in the past. The Pseudo Fisher Linear Discriminant (PFLDA) in [8], [10], =-=[23]-=-, [26], [27] is based on the pseudoinverse of the scatter matrices. The generalization error of PFLDA was studied in [8], [26], when the size and dimension of the training data vary. As observed in [8... |

39 | Lower dimensional representation of text data based on centroids and least squares - Park, Jeon, et al. - 2003 |

34 | Text REtrieval Conference. http:// trec.nist.gov - TREC - 2001 |

21 |
Stabilizing classifiers for very small sample sizes
- Skurichina, Duin
- 1996
(Show Context)
Citation Context ...calculation of Mahalanobis distance in singular cases. The use of classical LDA with the pseudoinverse has been proposed in the past. The Pseudo Fisher Linear Discriminant (PFLDA) in [8], [10], [23], =-=[26]-=-, [27] is based on the pseudoinverse of the scatter matrices. The generalization error of PFLDA was studied in [8], [26], when the size and dimension of the training data vary. As observed in [8], [26... |

21 |
Image classification by the Foley-Sammon transform
- Tian, Barbero, et al.
- 1986
(Show Context)
Citation Context ...er reducing the size of the training data [8], [26] or by increasing the dimension through addition of redundant features [27]. Pseudoinverses of the scatter matrices were also studied in [18], [20], =-=[29]-=-. Experiments in [18] showed that the pseudoinverse-based method achieved comparable performance with RLDA and two-stage LDA. It is worthwhile to note that both RLDA and two-stage LDA involve the esti... |

17 |
Matrix Computations. Third ed
- Golub, Loan
- 1996
(Show Context)
Citation Context ...s to the unique matrix satisfying the following four conditions: A þ AA þ A þ ; ð1Þ AA þ AA; ð2Þ ðAA þ Þ T AA þ ; ð3Þ ðA þ AÞ T A þ A: ð4Þ The pseudoinverse A þ is commonly computed by the SVD =-=[11]-=- as follows: Let AU 0 0 0 be the SVD of A, where U and V are orthogonal and is diagonal with positive diagonal entries. Then, the pseudoinverse of A, denoted as A þ , can be computed as A þ 1 0 V 0... |

12 |
Small sample size generalization
- Duin
- 1995
(Show Context)
Citation Context ...d to appropriate calculation of Mahalanobis distance in singular cases. The use of classical LDA with the pseudoinverse has been proposed in the past. The Pseudo Fisher Linear Discriminant (PFLDA) in =-=[8]-=-, [10], [23], [26], [27] is based on the pseudoinverse of the scatter matrices. The generalization error of PFLDA was studied in [8], [26], when the size and dimension of the training data vary. As ob... |

9 |
Eigenfaces versus fisherfaces: Recognition using class specific linear projection
- Belhummer, Hespanha, et al.
(Show Context)
Citation Context ... the data points is higher than the number of data points. Such high-dimensional, undersampled problems occur frequently in many applications, including information retrieval [25], facial recognition =-=[3]-=-, [28], and microarray analysis [1]. The application area of interest in this paper is vector space-based information retrieval. The dimension of the document vectors is typically very high, due to th... |

8 |
Regularized discriminant analysis and its application to face recognition
- Dai, Yuen
- 2003
(Show Context)
Citation Context ...remove some useful information. Some relevant references on this approach can be found in [15], [18]. The second approach solves the singularity problem by adding a perturbation to the scatter matrix =-=[4]-=-, [9], [18], [33]. The algorithm is known as the regularized LDA, or RLDA for short. A limitation of RLDA is that the amount of perturbation to be used is difficult to determine [4], [18]. The final a... |

7 |
Cluster structure preserving dimension reduction based on the generalized singular value decomposition
- Park, Jeon, et al.
(Show Context)
Citation Context .... However, its theoretical justification has not been studied well in the literature. Recently, a generalization of LDA based on the Generalized Singular Value Decomposition (GSVD) has been developed =-=[13]-=-, [14], which is applicable regardless of the data dimension and, therefore, can be used for undersampled problems. The classical LDA solution becomes a special case of this LDA/GSVD method. In [13], ... |

7 | Regularisation of linear classifiers by adding redundant features - Skurichina, Duin - 1999 |

5 |
Equivalence of several two-stage methods for linear discriminant analysis
- Howland, Park
- 2004
(Show Context)
Citation Context ...m is named two-stage LDA. This approach is straightforward but the intermediate dimension reduction stage may remove some useful information. Some relevant references on this approach can be found in =-=[15]-=-, [18]. The second approach solves the singularity problem by adding a perturbation to the scatter matrix [4], [9], [18], [33]. The algorithm is known as the regularized LDA, or RLDA for short. A limi... |

1 |
Regularized Linear Discriminant Analysis of Wavelet Compressed Ion Mobility Spectra
- Mehay, Cai, et al.
- 2002
(Show Context)
Citation Context ...y either reducing the size of the training data [8], [26] or by increasing the dimension through addition of redundant features [27]. Pseudoinverses of the scatter matrices were also studied in [18], =-=[20]-=-, [29]. Experiments in [18] showed that the pseudoinverse-based method achieved comparable performance with RLDA and two-stage LDA. It is worthwhile to note that both RLDA and two-stage LDA involve th... |

1 |
Criterion Functions for Document: Experiments and Analysis
- Zhao, Karypis
- 2001
(Show Context)
Citation Context ...or the pseudoinverse-based LDA. One limitation of the exact algorithm is the high-computational complexity of GSVD in handling large matrices. Typically, large document data sets, such as the ones in =-=[32]-=-, contain several thousands of documents with dimension in the tens of thousands. We propose an approximation algorithm based on subclustering of clusters to reduce the cost of computing the SVD invol... |