• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems (0)

by J Ye
Venue:J Mach Learn Res
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 28
Next 10 →

Extracting Shared Subspace for Multi-label Classification

by Shuiwang Ji, Lei Tang, Shipeng Yu, Jieping Ye
"... Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multipl ..."
Abstract - Cited by 17 (1 self) - Add to MetaCart
Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multi-label classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several well-known algorithms as special cases, thus elucidating their intrinsic relationships. We have conducted extensive experiments on eleven multitopic web page categorization tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.

Using uncorrelated discriminant analysis for tissue classification with gene expression data

by Jieping Ye, Tao Li, Tao Xiong, Ravi Janardan - IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004
"... Abstract—The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundred ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Abstract—The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a well-known technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various state-of-the-art classification methods on well-known gene expression data sets. Index Terms—Microarray data analysis, discriminant analysis, generalized singular value decomposition, classification. 1

Multi-class Discriminant Kernel Learning via Convex Programming

by Jieping Ye, Shuiwang Ji, Jianhui Chen, Isabelle Guyon, Amir Saffari
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.

SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis

by Deng Cai, Student Member, Xiaofei He, Jiawei Han, Senior Member - IEEE Transactions on Knowledge and Data Engineering , 2008
"... Abstract—Linear Discriminant Analysis (LDA) has been a popular method for extracting features that preserves class separability. The projection functions of LDA are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. It has been wid ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Abstract—Linear Discriminant Analysis (LDA) has been a popular method for extracting features that preserves class separability. The projection functions of LDA are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. It has been widely used in many fields of information processing, such as machine learning, data mining, information retrieval, and pattern recognition. However, the computation of LDA involves dense matrices eigendecomposition, which can be computationally expensive in both time and memory. Specifically, LDA has Oðmnt þ t3Þ time complexity and requires Oðmn þ mt þ ntÞ memory, where m is the number of samples, n is the number of features, and t minðm; nÞ. When both m and n are large, it is infeasible to apply LDA. In this paper, we propose a novel algorithm for discriminant analysis, called Spectral Regression Discriminant Analysis (SRDA). By using spectral graph analysis, SRDA casts discriminant analysis into a regression framework that facilitates both efficient computation and the use of regularization techniques. Specifically, SRDA only needs to solve a set of regularized least squares problems, and there is no eigenvector computation involved, which is a huge save of both time and memory. Our theoretical analysis shows that SRDA can be computed with OðmsÞ time and OðmsÞ memory, where sð nÞ is the average number of nonzero features in each sample. Extensive experimental results on four real-world data sets demonstrate the effectiveness and efficiency of our algorithm. Index Terms—Linear Discriminant Analysis, spectral regression, dimensionality reduction. Ç 1

Null space versus orthogonal linear discriminant analysis

by Jieping Ye, Tao Xiong - Proc. Int’l Conf. Machine Learning , 2006
"... Dimensionality reduction is an important pre-processing step for many applications. Linear Discriminant Analysis (LDA) is one of the well known methods for supervised dimensionality reduction. However, the classical LDA formulation requires the nonsingularity of scatter matrices involved. For unders ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Dimensionality reduction is an important pre-processing step for many applications. Linear Discriminant Analysis (LDA) is one of the well known methods for supervised dimensionality reduction. However, the classical LDA formulation requires the nonsingularity of scatter matrices involved. For undersampled problems, where the data dimension is much larger than the sample size, all scatter matrices are singular and classical LDA fails. Many extensions, including null space based LDA (NLDA), orthogonal LDA (OLDA), etc, have been proposed in the past to overcome this problem. In this paper, we present a computational and theoretical analysis of NLDA and OLDA. Our main result shows that under a mild condition which holds in many applications involving high-dimensional data, NLDA is equivalent to OLDA. We have performed extensive experiments on various types of data and results are consistent with our theoretical analysis. The presented analysis and experimental results provide further insight into several LDA based algorithms. 1.

Rotational Linear Discriminant Analysis Technique for Dimensionality Reduction (under review

by Alok Sharma, Kuldip K. Paliwal , 2006
"... Abstract—The linear discriminant analysis (LDA) technique is very popular in pattern recognition for dimensionality reduction. It is a supervised learning technique that finds a linear transformation such that the overlap between the classes is minimum for the projected feature vectors in the reduce ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract—The linear discriminant analysis (LDA) technique is very popular in pattern recognition for dimensionality reduction. It is a supervised learning technique that finds a linear transformation such that the overlap between the classes is minimum for the projected feature vectors in the reduced feature space. This overlap, if present, adversely affects the classification performance. In this paper, we introduce prior to dimensionality-reduction transformation an additional rotational transform that rotates the feature vectors in the original feature space around their respective class centroids in such a way that the overlap between the classes in the reduced feature space is further minimized. As a result, the classification performance significantly improves, which is demonstrated using several data corpuses. Index Terms—Rotational linear discriminant analysis, dimensionality reduction, classification error, fixed-point algorithm, probability of error. Ç 1

Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis

by Jieping Ye, Ravi Janardan, Senior Member, Qi Li, Student Member, Haesun Park
"... Abstract—High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature r ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract—High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then, the ULDA/GSVD algorithm is proposed, based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the sample size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of data sets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms. Index Terms—Feature reduction, uncorrelated linear discriminant analysis, QR-decomposition, generalized singular value decomposition. 1

Kernel Uncorrelated Discriminant Analysis for Radar Target Recognition

by Ling Wang, Liefeng Bo, Licheng Jiao
"... Abstract. Kernel fisher discriminant analysis (KFDA) has received extensive study in recent years as a dimensionality reduction technique. KFDA always encounters an intrinsic singularity of scatter matrices in the feature space, namely ‘small sample size ’ (SSS) problem. Several novel methods have b ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Kernel fisher discriminant analysis (KFDA) has received extensive study in recent years as a dimensionality reduction technique. KFDA always encounters an intrinsic singularity of scatter matrices in the feature space, namely ‘small sample size ’ (SSS) problem. Several novel methods have been proposed to cope with this problem. In this paper, kernel uncorrelated discriminant analysis (KUDA) is proposed, which not only can bear on the SSS problem but also extract uncorrelated features, a desirable property for many applications. And then, we have conducted a comparative study on the application of KUDA and other variants of KFDA in radar target recognition problem. The experimental results indicate the effectiveness of KUDA and illustrate the utility of KFDA on the problem. 1

Identifying Biologically Relevant Genes via Multiple Heterogeneous Data Sources

by Zheng Zhao, Jiangxin Wang, Huan Liu, Jieping Ye, Yung Chang
"... Selection of genes that are differentially expressed and critical to a particular biological process has been a major challenge in post-array analysis. Recent development in bioinformatics has made various data sources available such as mRNA and miRNA expression profiles, biological pathway and gene ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Selection of genes that are differentially expressed and critical to a particular biological process has been a major challenge in post-array analysis. Recent development in bioinformatics has made various data sources available such as mRNA and miRNA expression profiles, biological pathway and gene annotation, etc. Efficient and effective integration of multiple data sources helps enrich our knowledge about the involved samples and genes for selecting genes bearing significant biological relevance. In this work, we studied a novel problem of multi-source gene selection: given multiple heterogeneous data sources (or data sets), select genes from expression profiles by integrating information from various data sources. We investigated how to effectively employ information contained in multiple data sources to extract an intrinsic global geometric pattern and use it in covariance analysis for gene selection. We designed and conducted experiments to systematically compare the proposed approach with representative methods in terms of statistical and biological significance, and showed the efficacy and potential of the proposed approach with promising findings.

Spectral Embedded Clustering

by Feiping Nie, Dong Xu, Ivor W. Tsang, Changshui Zhang
"... In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data. SEC is based on the observation that the cluster assignment matrix of high dimensional data can be represented by a low dimensional linear mapping of data. We also discover the connection between SEC and other clustering methods, such as spectral clustering, Clustering with local and global regularization, K-means and Discriminative K-means. The experiments on many realworld data sets show that SEC significantly outperforms the existing spectral clustering methods as well as K-means clustering related methods.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University