Results 1  10
of
308
Locality Preserving Projections
, 2002
"... Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data s ..."
Abstract

Cited by 382 (16 self)
 Add to MetaCart
Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA)  a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.
Projected gradient methods for Nonnegative Matrix Factorization
 Neural Computation
, 2007
"... Nonnegative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although boundconstrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two proj ..."
Abstract

Cited by 270 (2 self)
 Add to MetaCart
(Show Context)
Nonnegative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although boundconstrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two projected gradient methods for NMF, both of which exhibit strong optimization properties. We discuss efficient implementations and demonstrate that one of the proposed methods converges faster than the popular multiplicative update approach. A simple MATLAB code is also provided. 1
On the equivalence of nonnegative matrix factorization and spectral clustering
 in SIAM International Conference on Data Mining
, 2005
"... Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel Kmeans clustering and the Laplacianbased spectral clustering. (2) X ..."
Abstract

Cited by 152 (20 self)
 Add to MetaCart
(Show Context)
Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel Kmeans clustering and the Laplacianbased spectral clustering. (2) X = FGT is equivalent to simultaneous clustering of rows and columns of a bipartite graph. Algorithms are given for computing these symmetric NMFs. 1
Orthogonal nonnegative matrix trifactorizations for clustering
 In SIGKDD
, 2006
"... Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constr ..."
Abstract

Cited by 110 (21 self)
 Add to MetaCart
(Show Context)
Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constrained 2factor NMF. We study the orthogonality constraint because it leads to rigorous clustering interpretation. We provide new rules for updating F,S,G and prove the convergence of these algorithms. Experiments on 5 datasets and a real world case study are performed to show the capability of biorthogonal 3factor NMF on simultaneously clustering rows and columns of the input data matrix. We provide a new approach of evaluating the quality of clustering on words using class aggregate distribution and multipeak distribution. We also provide an overview of various NMF extensions and examine their relationships.
Convex and SemiNonnegative Matrix Factorizations
, 2008
"... We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable ra ..."
Abstract

Cited by 102 (9 self)
 Add to MetaCart
We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.
Generalized nonnegative matrix approximations with Bregman divergences
 In: Neural Information Proc. Systems
, 2005
"... Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/imag ..."
Abstract

Cited by 92 (5 self)
 Add to MetaCart
(Show Context)
Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous applications, the algorithmic development for computing the NNMA factors has been relatively deficient. This paper makes algorithmic progress by modeling and solving (using multiplicative updates) new generalized NNMA problems that minimize Bregman divergences between the input matrix and its lowrank approximation. The multiplicative update formulae in the pioneering work by Lee and Seung [11] arise as a special case of our algorithms. In addition, the paper shows how to use penalty functions for incorporating constraints other than nonnegativity into the problem. Further, some interesting extensions to the use of “link ” functions for modeling nonlinear relationships are also discussed. 1
Graph regularized nonnegative matrix factorization for data representation
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2011
"... Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring dat ..."
Abstract

Cited by 86 (4 self)
 Add to MetaCart
(Show Context)
Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a lowdimensional manifold embedded in a highdimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the stateoftheart algorithms on realworld problems.
Document clustering using locality preserving indexing
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality ..."
Abstract

Cited by 73 (19 self)
 Add to MetaCart
(Show Context)
Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lowerdimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters21578 and TDT2 data sets. Index Terms—Document clustering, locality preserving indexing, dimensionality reduction, semantics. æ 1
Name disambiguation in author citations using a Kway spectral clustering method
 INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES
, 2005
"... An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database ..."
Abstract

Cited by 66 (7 self)
 Add to MetaCart
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using Kway spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: coauthor names, paper titles, and publication venue titles 2. The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.
Combining Content and Link for Classification using Matrix Factorization
, 2007
"... The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditi ..."
Abstract

Cited by 64 (8 self)
 Add to MetaCart
The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditional mining or learning methods for solving web mining problems, e.g., web page classification, by exploiting both the content and the link structure. The research in this direction has recently received considerable attention but are still in an early stage. Though a few methods exploit both the link structure or the content information, some of them combine the only authority information with the content information, and the others first decompose the link structure into hub and authority features, then apply them as additional document features. Being practically attractive for its great simplicity, this paper aims to design an algorithm that exploits both the content and linkage information, by carrying out a joint factorization on both the linkage adjacency matrix and the documentterm matrix, and derives a new representation for web pages in a lowdimensional factor space, without explicitly separating them as content, hub or authority factors. Further analysis can be performed based on the compact representation of web pages. In the experiments, the proposed method is compared with stateoftheart methods and demonstrates an excellent accuracy in hypertext classification on the WebKB and Cora benchmarks.