Results 1  10
of
155
Locality Preserving Projections
, 2002
"... Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data s ..."
Abstract

Cited by 209 (15 self)
 Add to MetaCart
Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA)  a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.
Projected gradient methods for Nonnegative Matrix Factorization
 Neural Computation
, 2007
"... Nonnegative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although boundconstrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two proj ..."
Abstract

Cited by 134 (2 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although boundconstrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two projected gradient methods for NMF, both of which exhibit strong optimization properties. We discuss efficient implementations and demonstrate that one of the proposed methods converges faster than the popular multiplicative update approach. A simple MATLAB code is also provided. 1
On the equivalence of nonnegative matrix factorization and spectral clustering
 in SIAM International Conference on Data Mining
, 2005
"... Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel Kmeans clustering and the Laplacianbased spectral clustering. (2) X ..."
Abstract

Cited by 85 (10 self)
 Add to MetaCart
Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel Kmeans clustering and the Laplacianbased spectral clustering. (2) X = FGT is equivalent to simultaneous clustering of rows and columns of a bipartite graph. Algorithms are given for computing these symmetric NMFs. 1
Orthogonal nonnegative matrix trifactorizations for clustering
 In SIGKDD
, 2006
"... Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constr ..."
Abstract

Cited by 66 (18 self)
 Add to MetaCart
Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constrained 2factor NMF. We study the orthogonality constraint because it leads to rigorous clustering interpretation. We provide new rules for updating F,S,G and prove the convergence of these algorithms. Experiments on 5 datasets and a real world case study are performed to show the capability of biorthogonal 3factor NMF on simultaneously clustering rows and columns of the input data matrix. We provide a new approach of evaluating the quality of clustering on words using class aggregate distribution and multipeak distribution. We also provide an overview of various NMF extensions and examine their relationships.
Generalized nonnegative matrix approximations with Bregman divergences
 In: Neural Information Proc. Systems
, 2005
"... Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/imag ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous applications, the algorithmic development for computing the NNMA factors has been relatively deficient. This paper makes algorithmic progress by modeling and solving (using multiplicative updates) new generalized NNMA problems that minimize Bregman divergences between the input matrix and its lowrank approximation. The multiplicative update formulae in the pioneering work by Lee and Seung [11] arise as a special case of our algorithms. In addition, the paper shows how to use penalty functions for incorporating constraints other than nonnegativity into the problem. Further, some interesting extensions to the use of “link ” functions for modeling nonlinear relationships are also discussed. 1
Convex and SemiNonnegative Matrix Factorizations
, 2008
"... We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable ra ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.
Name disambiguation in author citations using a Kway spectral clustering method
 INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES
, 2005
"... An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using Kway spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: coauthor names, paper titles, and publication venue titles 2. The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.
Combining Content and Link for Classification using Matrix Factorization
, 2007
"... The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditi ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditional mining or learning methods for solving web mining problems, e.g., web page classification, by exploiting both the content and the link structure. The research in this direction has recently received considerable attention but are still in an early stage. Though a few methods exploit both the link structure or the content information, some of them combine the only authority information with the content information, and the others first decompose the link structure into hub and authority features, then apply them as additional document features. Being practically attractive for its great simplicity, this paper aims to design an algorithm that exploits both the content and linkage information, by carrying out a joint factorization on both the linkage adjacency matrix and the documentterm matrix, and derives a new representation for web pages in a lowdimensional factor space, without explicitly separating them as content, hub or authority factors. Further analysis can be performed based on the compact representation of web pages. In the experiments, the proposed method is compared with stateoftheart methods and demonstrates an excellent accuracy in hypertext classification on the WebKB and Cora benchmarks.
Document clustering using locality preserving indexing
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality ..."
Abstract

Cited by 39 (16 self)
 Add to MetaCart
Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lowerdimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters21578 and TDT2 data sets. Index Terms—Document clustering, locality preserving indexing, dimensionality reduction, semantics. æ 1
Fast newtontype methods for the least squares nonnegative matrix approximation problem
 Statistical Analysis and Data Mining
, 2008
"... Nonnegative Matrix Approximation is an effective matrix decomposition technique that has proven to be useful for a wide variety of applications ranging from document analysis and image processing to bioinformatics. There exist a few algorithms for nonnegative matrix approximation (NNMA), for example ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
Nonnegative Matrix Approximation is an effective matrix decomposition technique that has proven to be useful for a wide variety of applications ranging from document analysis and image processing to bioinformatics. There exist a few algorithms for nonnegative matrix approximation (NNMA), for example, Lee & Seung’s multiplicative updates, alternating least squares, and certain gradient descent based procedures. All of these procedures suffer from either slow convergence, numerical instabilities, or at worst, theoretical unsoundness. In this paper we present new and improved algorithms for the leastsquares NNMA problem, which are not only theoretically wellfounded, but also overcome many of the deficiencies of other methods. In particular, we use nondiagonal gradient scaling to obtain rapid convergence. Our methods provide numerical results superior to both Lee & Seung’s method as well to the alternating least squares (ALS) heuristic, which is known to work well in some situations but has no theoretical guarantees (Berry et al. 2006). Our approach extends naturally to include regularization and boxconstraints, without sacrificing convergence guarantees. We present experimental results on both synthetic and realworld datasets to demonstrate the superiority of our methods, in terms of better approximations as well as efficiency.