Results 1 - 10
of
20
Generalizing discriminant analysis using the generalized singular value decomposition
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limi ..."
Abstract
-
Cited by 38 (11 self)
- Add to MetaCart
Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be applied even when the sample size is smaller than the dimension of the sample data. We use classification results from the reduced representation to compare the effectiveness of this approach with some alternatives, and conclude with a discussion of their relative merits. 1
Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems
- Journal of Machine Learning Research
, 2005
"... A generalized discriminant analysis based on a new optimization criterion is presented. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) when the scatter matrices are singular. An efficient algorithm for the new optimization problem is presented. Th ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
A generalized discriminant analysis based on a new optimization criterion is presented. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) when the scatter matrices are singular. An efficient algorithm for the new optimization problem is presented. The solutions to the proposed criterion form a family of algorithms for generalized LDA, which can be characterized in a closed form. We study two specific algorithms, namely Uncorrelated LDA (ULDA) and Orthogonal LDA (OLDA). ULDA was previously proposed for feature extraction and dimension reduction, whereas OLDA is a novel algorithm proposed in this paper. The features in the reduced space of ULDA are uncorrelated, while the discriminant vectors of OLDA are orthogonal to each other. We have conducted a comparative study on a variety of real-world data sets to evaluate ULDA and OLDA in terms of classification accuracy.
A two-stage linear discriminant analysis via qr-decomposition
- IEEE Transaction on Pattern Analysis and Machine Intelligence
, 2005
"... Abstract—Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as image and text classification. An intrinsic limitation of classical LDA is the so-called singularit ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Abstract—Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as image and text classification. An intrinsic limitation of classical LDA is the so-called singularity problems; that is, it fails when all scatter matrices are singular. Many LDA extensions were proposed in the past to overcome the singularity problems. Among these extensions, PCA+LDA, a two-stage method, received relatively more attention. In PCA+LDA, the LDA stage is preceded by an intermediate dimension reduction stage using Principal Component Analysis (PCA). Most previous LDA extensions are computationally expensive, and not scalable, due to the use of Singular Value Decomposition or Generalized Singular Value Decomposition. In this paper, we propose a two-stage LDA method, namely LDA/QR, which aims to overcome the singularity problems of classical LDA, while achieving efficiency and scalability simultaneously. The key difference between LDA/QR and PCA+LDA lies in the first stage, where LDA/QR applies QR decomposition to a small matrix involving the class centroids, while PCA+LDA applies PCA to the total scatter matrix involving all training data points. We further justify the proposed algorithm by showing the relationship among LDA/QR and previous LDA methods. Extensive experiments on face images and text documents are presented to show the effectiveness of the proposed algorithm. Index Terms—Linear discriminant analysis, dimension reduction, QR decomposition, classification. 1
Dimension reduction in text classification with support vector machines
- Journal of Machine Learning Research
, 2005
"... Support vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent of the dimension of th ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Support vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent of the dimension of the feature space, reducing computational complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification. In this paper, we adopt novel dimension reduction methods to reduce the dimension of the document vectors dramatically. We also introduce decision functions for the centroid-based classification algorithm and support vector classifiers to handle the classification problem where a document may belong to multiple classes. Our substantial experimental results show that with several dimension reduction methods that are designed particularly for clustered data, higher efficiency for both training and testing can be achieved without sacrificing prediction accuracy of text classification even when the dimension of the input space is significantly reduced.
Using uncorrelated discriminant analysis for tissue classification with gene expression data
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
, 2004
"... Abstract—The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundred ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract—The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a well-known technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various state-of-the-art classification methods on well-known gene expression data sets. Index Terms—Microarray data analysis, discriminant analysis, generalized singular value decomposition, classification. 1
Nonlinear feature extraction based on centroids and kernel functions
- PATTERN RECOGNITION
, 2002
"... A nonlinear feature extraction method is presented which can reduce the data dimension down to the number of clusters, providing dramatic savings in computational costs. The dimension reducing nonlinear transformation is obtained by implicitly mapping the input data into a feature space using a kern ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
A nonlinear feature extraction method is presented which can reduce the data dimension down to the number of clusters, providing dramatic savings in computational costs. The dimension reducing nonlinear transformation is obtained by implicitly mapping the input data into a feature space using a kernel function, and then finding a linear mapping based on an orthonormal basis of centroids in the feature space that maximally separates the between-cluster relationship. The experimental results demonstrate that our method is capable of extracting nonlinear features effectively so that competitive performance of classification can be obtained with linear classifiers in the dimension reduced space.
A relationship between LDA and the generalized minimum squared error solution
- SIAM Journal on Matrix Analysis and Applications
, 2005
"... In this paper, a relationship between Linear Discriminant Analysis (LDA) and the generalized Minimum Squared Error (MSE) solution is presented. The generalized MSE solution is shown to be equivalent to applying a certain classification rule in the space defined by LDA. The relationship between the M ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, a relationship between Linear Discriminant Analysis (LDA) and the generalized Minimum Squared Error (MSE) solution is presented. The generalized MSE solution is shown to be equivalent to applying a certain classification rule in the space defined by LDA. The relationship between the MSE solution and Fisher Discriminant Analysis (FDA) is extended to multi-class problems and also to undersampled problems for which the classical LDA is not applicable due to singularity of the scatter matrices. In addition, an efficient algorithm for LDA is proposed exploiting its relationship with the MSE procedure. Extensive experiments verify the theoretical results.
Fast Linear Discriminant Analysis using QR Decomposition and Regularization
, 2007
"... findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect
A RELATIONSHIP BETWEEN LINEAR DISCRIMINANT ANALYSIS AND THE GENERALIZED MINIMUM SQUARED ERROR SOLUTION
, 2005
"... In this paper, a relationship between linear discriminant analysis (LDA) and the generalized minimum squared error (MSE) solution is presented. The generalized MSE solution is shown to be equivalent to applying a certain classification rule in the space defined by LDA. The relationship between the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, a relationship between linear discriminant analysis (LDA) and the generalized minimum squared error (MSE) solution is presented. The generalized MSE solution is shown to be equivalent to applying a certain classification rule in the space defined by LDA. The relationship between the MSE solution and Fisher discriminant analysis is extended to multiclass problems and also to undersampled problems for which the classical LDA is not applicable due to singularity of the scatter matrices. In addition, an efficient algorithm for LDA is proposed exploiting its relationship with the MSE procedure. Extensive experiments verify the theoretical results.
Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis
"... Abstract—High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature r ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then, the ULDA/GSVD algorithm is proposed, based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the sample size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of data sets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms. Index Terms—Feature reduction, uncorrelated linear discriminant analysis, QR-decomposition, generalized singular value decomposition. 1

