Results 1  10
of
29
Generalizing discriminant analysis using the generalized singular value decomposition
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limi ..."
Abstract

Cited by 54 (14 self)
 Add to MetaCart
Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be applied even when the sample size is smaller than the dimension of the sample data. We use classification results from the reduced representation to compare the effectiveness of this approach with some alternatives, and conclude with a discussion of their relative merits. 1
Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition
 SIAM Journal on Matrix Analysis and Applications
, 2003
"... Abstract. In today’s vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lowerdimensional representation must be a good approximation of the full document set. To that end, we adapt and extend th ..."
Abstract

Cited by 42 (19 self)
 Add to MetaCart
Abstract. In today’s vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lowerdimensional representation must be a good approximation of the full document set. To that end, we adapt and extend the discriminant analysis projection used in pattern recognition. This projection preserves cluster structure by maximizing the scatter between clusters while minimizing the scatter within clusters. A common limitation of trace optimization in discriminant analysis is that one of the scatter matrices must be nonsingular, which restricts its application to document sets in which the number of terms does not exceed the number of documents. We show that by using the generalized singular value decomposition (GSVD), we can achieve the same goal regardless of the relative dimensions of the termdocument matrix. In addition, applying the GSVD allows us to avoid the explicit formation of the scatter matrices in favor of working directly with the data matrix, thus improving the numerical properties of the approach. Finally, we present experimental results that confirm the effectiveness of our approach.
On Scaling Latent Semantic Indexing for Large PeerToPeer Systems
 Proc. 27th Annual International ACM SIGIR Conference
, 2004
"... The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information retrieval with peertopeer technology for scalability, faulttolerance, and low administration cost. One pi ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information retrieval with peertopeer technology for scalability, faulttolerance, and low administration cost. One pioneering work along this direction is pSearch [32, 33]. pSearch places documents onto a peerto peer overlay network according to semantic vectors produced using Latent Semantic Indexing (LSI). The search cost for a query is reduced since documents related to the query are likely to be colocated on a small number of nodes. Unfortunately, because of its reliance on LSI, pSearch also inherits the limitations of LSI. (1) When the corpus is large and heterogeneous, LSI's retrieval quality is inferior to methods such as Okapi. (2) The Singular Value Decomposition (SVD) used in LSI is unscalable in terms of both memory consumption and computation time.
An optimization criterion for generalized discriminant analysis on undersampled problems
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2004
"... Abstract—An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative size ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
Abstract—An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative sizes of the data dimension and sample size, overcoming a limitation of classical LDA. The optimization problem can be solved analytically by applying the Generalized Singular Value Decomposition (GSVD) technique. The pseudoinverse has been suggested and used for undersampled problems in the past, where the data dimension exceeds the number of data points. The criterion proposed in this paper provides a theoretical justification for this procedure. An approximation algorithm for the GSVDbased approach is also presented. It reduces the computational complexity by finding subclusters of each cluster and uses their centroids to capture the structure of each cluster. This reduced problem yields much smaller matrices to which the GSVD can be applied efficiently. Experiments on text data, with up to 7,000 dimensions, show that the approximation algorithm produces results that are close to those produced by the exact algorithm. Index Terms—Classification, clustering, dimension reduction, generalized singular value decomposition, linear discriminant analysis, text mining. 1
Extracting Shared Subspace for Multilabel Classification
"... Multilabel problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multipl ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
Multilabel problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multilabel classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For highdimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several wellknown algorithms as special cases, thus elucidating their intrinsic relationships. We have conducted extensive experiments on eleven multitopic web page categorization tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.
Dimension reduction in text classification with support vector machines
 Journal of Machine Learning Research
, 2005
"... Support vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent of the dimension of th ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
Support vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent of the dimension of the feature space, reducing computational complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification. In this paper, we adopt novel dimension reduction methods to reduce the dimension of the document vectors dramatically. We also introduce decision functions for the centroidbased classification algorithm and support vector classifiers to handle the classification problem where a document may belong to multiple classes. Our substantial experimental results show that with several dimension reduction methods that are designed particularly for clustered data, higher efficiency for both training and testing can be achieved without sacrificing prediction accuracy of text classification even when the dimension of the input space is significantly reduced.
CLSI: A flexible approximation scheme from clustered termdocument matrices
 In SDM
, 2005
"... We investigate a methodology for matrix approximation and IR. A central feature of these techniques is an initial clustering phase on the columns of the termdocument matrix, followed by partial SVD on the columns constituting each cluster. The extracted information is used to build effective low ra ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We investigate a methodology for matrix approximation and IR. A central feature of these techniques is an initial clustering phase on the columns of the termdocument matrix, followed by partial SVD on the columns constituting each cluster. The extracted information is used to build effective low rank approximations to the original matrix as well as for IR. The algorithms can be expressed by means of rank reduction formulas. Experiments indicate that these methods can achieve good overall performance for matrix approximation and IR and compete well with existing schemes. Keywords: Low rank approximations, Clustering, LSI. 1 Introduction and
Nonnegative matrix factorization and applications
 Bulletin of the International Linear Algebra Society
, 2005
"... Data analysis is pervasive throughout science, engineering and business applications. Very often the data to be analyzed is nonnegative, and it is very often preferable to take this constraint into account in the analysis process. In this paper we provide a survey of some aspects of nonnegative matr ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Data analysis is pervasive throughout science, engineering and business applications. Very often the data to be analyzed is nonnegative, and it is very often preferable to take this constraint into account in the analysis process. In this paper we provide a survey of some aspects of nonnegative matrix factorization and its applications to nonnegative matrix data analysis. In general the problem is the following: given a nonnegative
Nonnegativity Constraints in Numerical Analysis
"... A survey of the development of algorithms for enforcing nonnegativity constraints in scientific computation is given. Special emphasis is placed on such constraints in least squares computations in numerical linear algebra and in nonlinear optimization. Techniques involving nonnegative lowrank matr ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
A survey of the development of algorithms for enforcing nonnegativity constraints in scientific computation is given. Special emphasis is placed on such constraints in least squares computations in numerical linear algebra and in nonlinear optimization. Techniques involving nonnegative lowrank matrix and tensor factorizations are also emphasized. Details are provided for some important classical and modern applications in science and engineering. For completeness, this report also includes an effort toward a literature survey of the various algorithms and applications of nonnegativity constraints in numerical analysis. Key Words: nonnegativity constraints, nonnegative least squares, matrix and tensor factorizations, image processing, optimization.
Z: Effective and Efficient Dimensionality Reduction for LargeScale and Streaming Data Preprocessing
 Knowledge and Data Engineering, IEEE Transactions on
"... Abstract—Dimensionality reduction is an essential data preprocessing technique for largescale and streaming data classification tasks. It can be used to improve both the efficiency and the effectiveness of classifiers. Traditional dimensionality reduction approaches fall into two categories: Featur ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract—Dimensionality reduction is an essential data preprocessing technique for largescale and streaming data classification tasks. It can be used to improve both the efficiency and the effectiveness of classifiers. Traditional dimensionality reduction approaches fall into two categories: Feature Extraction and Feature Selection. Techniques in the feature extraction category are typically more effective than those in feature selection category. However, they may break down when processing largescale data sets or data streams due to their high computational complexities. Similarly, the solutions provided by the feature selection approaches are mostly solved by greedy strategies and, hence, are not ensured to be optimal according to optimized criteria. In this paper, we give an overview of the popularly used feature extraction and selection algorithms under a unified framework. Moreover, we propose two novel dimensionality reduction algorithms based on the Orthogonal Centroid algorithm (OC). The first is an Incremental OC (IOC) algorithm for feature extraction. The second algorithm is an Orthogonal Centroid Feature Selection (OCFS) method which can provide optimal solutions according to the OC criterion. Both are designed under the same optimization criterion. Experiments on Reuters Corpus Volume1 data set and some public largescale text data sets indicate that the two algorithms are favorable in terms of their effectiveness and efficiency when compared with other stateoftheart algorithms. Index Terms—Feature extraction, feature selection, orthogonal centroid algorithm. 1