Results 1  10
of
36
Orthogonal nonnegative matrix trifactorizations for clustering
 In SIGKDD
, 2006
"... Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constr ..."
Abstract

Cited by 114 (22 self)
 Add to MetaCart
(Show Context)
Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constrained 2factor NMF. We study the orthogonality constraint because it leads to rigorous clustering interpretation. We provide new rules for updating F,S,G and prove the convergence of these algorithms. Experiments on 5 datasets and a real world case study are performed to show the capability of biorthogonal 3factor NMF on simultaneously clustering rows and columns of the input data matrix. We provide a new approach of evaluating the quality of clustering on words using class aggregate distribution and multipeak distribution. We also provide an overview of various NMF extensions and examine their relationships.
Document clustering using locality preserving indexing
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality ..."
Abstract

Cited by 78 (19 self)
 Add to MetaCart
(Show Context)
Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lowerdimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters21578 and TDT2 data sets. Index Terms—Document clustering, locality preserving indexing, dimensionality reduction, semantics. æ 1
Adaptive dimension reduction using discriminant analysis and kmeans clustering
 In ICML
, 2007
"... We combine linear discriminant analysis (LDA) and Kmeans clustering into a coherent framework to adaptively select the most discriminative subspace. We use Kmeans clustering to generate class labels and use LDA to do subspace selection. The clustering process is thus integrated with the subspace s ..."
Abstract

Cited by 55 (7 self)
 Add to MetaCart
(Show Context)
We combine linear discriminant analysis (LDA) and Kmeans clustering into a coherent framework to adaptively select the most discriminative subspace. We use Kmeans clustering to generate class labels and use LDA to do subspace selection. The clustering process is thus integrated with the subspace selection process and the data are then simultaneously clustered while the feature subspaces are selected. We show the rich structure of the general LDAKm framework by examining its variants and their relationships to earlier approaches. Extensive experimental results on realworld datasets show the effectiveness of our approach. 1.
Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding
 EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2008
"... Nonnegative matrix factorization (NMF) is a versatile model for data clustering. In this paper, we propose several NMF inspired algorithms to solve different data mining problems. They include (1) multiway normalized cut spectral clustering, (2) graph matching of both undirected and directed graphs ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) is a versatile model for data clustering. In this paper, we propose several NMF inspired algorithms to solve different data mining problems. They include (1) multiway normalized cut spectral clustering, (2) graph matching of both undirected and directed graphs, and (3) maximal clique finding on both graphs and bipartite graphs. Key features of these algorithms are (a) they are extremely simple to implement; and (b) they are provably convergent. We conduct experiments to demonstrate the effectiveness of these new algorithms. We also derive a new spectral bound for the size of maximal edge bicliques as a byproduct of our approach.
A Comprehensive Comparison Study of Document Clustering for a Biomedical Digital Library MEDLINE
 MDELINE, accepted in ACM/IEEE Joint Conference on Digital Libraries, Chapel Hill, NC
, 2006
"... www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyri ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyright law. Unless otherwise noted, the Material is made available for non profit and educational purposes, such as research, teaching and private study. For these limited purposes, you may reproduce (print, download or make copies) the Material without prior permission. All copies must include any copyright notice originally included with the Material. You must seek permission from the authors or copyright owners for all uses that are not allowed by fair use and other provisions of the U.S. Copyright Law. The responsibility for making an independent legal assessment and securing any necessary permission rests with persons desiring to reproduce or use the Material. Please direct questions to archives@drexel.edu
Binary Matrix Factorization with Applications
"... An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into tw ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W,H (thus conserving the most important integer property of the objective matrix X) satisfying X ≈ WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF. 1.
Learning the shared subspace for multitask clustering and transductive transfer classification
 Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on
, 2009
"... Abstract—There are many clustering tasks which are closely related in the real world, e.g. clustering the web pages of different universities. However, existing clustering approaches neglect the underlying relation and treat these clustering tasks either individually or simply together. In this pape ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
Abstract—There are many clustering tasks which are closely related in the real world, e.g. clustering the web pages of different universities. However, existing clustering approaches neglect the underlying relation and treat these clustering tasks either individually or simply together. In this paper, we will study a novel clustering paradigm, namely multitask clustering, which performs multiple related clustering tasks together and utilizes the relation of these tasks to enhance the clustering performance. We aim to learn a subspace shared by all the tasks, through which the knowledge of the tasks can be transferred to each other. The objective of our approach consists of two parts: (1) Withintask clustering: clustering the data of each task in its input space individually; and (2) Crosstask clustering: simultaneous learning the shared subspace and clustering the data of all the tasks together. We will show that it can be solved by alternating minimization, and its convergence is theoretically guaranteed. Furthermore, we will show that given the labels of one task, our multitask clustering method can be extended to transductive transfer classification (a.k.a. crossdomain classification, domain adaption). Experiments on several crossdomain text data sets demonstrate that the proposed multitask clustering outperforms traditional singletask clustering methods greatly. And the transductive transfer classification method is comparable to or even better than several existing transductive transfer classification approaches. Keywordsmultitask clustering; transductive transfer classification; multitask learning; transfer learning; cross domain classification; domain adaption I.
Spectral Embedded Clustering
"... In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data. SEC is based on the observation that the cluster assignment matrix of high dimensional data can be represented by a low dimensional linear mapping of data. We also discover the connection between SEC and other clustering methods, such as spectral clustering, Clustering with local and global regularization, Kmeans and Discriminative Kmeans. The experiments on many realworld data sets show that SEC significantly outperforms the existing spectral clustering methods as well as Kmeans clustering related methods.
A unified view on clustering binary data
 Machine Learning
"... Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of dat ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of data analysis. A unified view of binary data clustering is presented by examining the connections among various clustering criteria. Experimental studies are conducted to empirically verify the relationships. 1
Spectral Embedded Clustering: A Framework for InSample and OutofSample Spectral Clustering
"... Abstract — Spectral clustering (SC) methods have been successfully applied to many realworld applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the highdensity region of a lowdimensional data manifold have the same clu ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract — Spectral clustering (SC) methods have been successfully applied to many realworld applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the highdensity region of a lowdimensional data manifold have the same cluster label. However, such an assumption might not always hold on highdimensional data. When the data do not exhibit a clear lowdimensional manifold structure (e.g., highdimensional and sparse data), the clustering performance of SC will be degraded and become even worse than Kmeans clustering. In this paper, motivated by the observation that the true cluster assignment matrix for highdimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with outofsample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight realworld highdimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and Kmeansbased clustering methods. Our SEC framework significantly outperforms SC using the Nyström algorithm on unseen data. Index Terms — Linearity regularization, outofsample clustering, spectral clustering, spectral embedded clustering.