Results 1  10
of
81
Graph regularized nonnegative matrix factorization for data representation
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2011
"... Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring dat ..."
Abstract

Cited by 90 (4 self)
 Add to MetaCart
Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a lowdimensional manifold embedded in a highdimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the stateoftheart algorithms on realworld problems.
Nonnegative matrix factorization on manifold
 In ICDM
, 2008
"... Recently Nonnegative Matrix Factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two nonnegative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
Recently Nonnegative Matrix Factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two nonnegative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller than the original matrix. This results in a compressed version of the original data matrix. The solution of NMF yields a natural partsbased representation for the data. When NMF is applied for data representation, a major disadvantage is that it fails to consider the geometric structure in the data. In this paper, we develop a graph based approach for partsbased data representation in order to overcome this limitation. We construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure. We demonstrate the success of this novel algorithm by applying it on real world problems. 1.
Patch alignment for dimensionality reduction
 IEEE Trans. Knowl. Data Eng
, 2009
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract

Cited by 37 (12 self)
 Add to MetaCart
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Graph regularized sparse coding for image representation
 IEEE Transactions on Image Processing
, 2011
"... Abstract—Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing highlevel semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing highlevel semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual cortex, sparse coding has been shown useful for many applications. However, most of the existing approaches to sparse coding fail to consider the geometrical structure of the data space. In many real applications, the data is more likely to reside on a lowdimensional submanifold embedded in the highdimensional ambient space. It has been shown that the geometrical information of the data is important for discrimination. In this paper, we propose a graph based algorithm, called graph regularized sparse coding, to learn the sparse representations that explicitly take into account the local manifold structure of the data. By using graph Laplacian as a smooth operator, the obtained sparse representations vary smoothly along the geodesics of the data manifold. The extensive experimental results on image classification and clustering have demonstrated the effectiveness of our proposed algorithm. Index Terms—Image classification, image clustering, manifold learning, sparse coding. I.
Modeling Hidden Topics on Document Manifold
 In Proceedings of the 17th ACM Conference on Information and Knowledge Management
, 2008
"... ABSTRACT Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the joint probability of documents and terms in the corpus. The major disadvantage of PLSI is that it estimates the pro ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
(Show Context)
ABSTRACT Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the joint probability of documents and terms in the corpus. The major disadvantage of PLSI is that it estimates the probability distribution of each document on the hidden topics independently and the number of parameters in the model grows linearly with the size of the corpus, which leads to serious problems with overfitting. Latent Dirichlet Allocation (LDA) is proposed to overcome this problem by treating the probability distribution of each document over topics as a hidden random variable. Both of these two methods discover the hidden topics in the Euclidean space. However, there is no convincing evidence that the document space is Euclidean, or flat. Therefore, it is more natural and reasonable to assume that the document space is a manifold, either linear or nonlinear. In this paper, we consider the problem of topic modeling on intrinsic document manifold. Specifically, we propose a novel algorithm called Laplacian Probabilistic Latent Semantic Indexing (LapPLSI) for topic modeling. LapPLSI models the document space as a submanifold embedded in the ambient space and directly performs the topic modeling on this document manifold in question. We compare the proposed LapPLSI approach with PLSI and LDA on three text data sets. Experimental results show that LapPLSI provides better representation in the sense of semantic structure.
Large Scale Spectral Clustering with LandmarkBased Representation
 PROCEEDINGS OF THE TWENTYFIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2011
"... Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to largescale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to largescale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called Landmarkbased Spectral Clustering (LSC), for large scale clustering problems. Specifically, we select p ( ≪ n) representative data points as the landmarks and represent the original data points as the linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmarkbased representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the stateoftheart methods.
Spectral Regression: A Unified Approach for Sparse Subspace Learning
"... Recently the problem of dimensionality reduction (or, subspace learning) has received a lot of interests in many fields of information processing, including data mining, information retrieval, and pattern recognition. Some popular methods include Principal Component Analysis (PCA), Linear Discrimina ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
Recently the problem of dimensionality reduction (or, subspace learning) has received a lot of interests in many fields of information processing, including data mining, information retrieval, and pattern recognition. Some popular methods include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Locality Preserving Projection (LPP). However, a disadvantage of all these approaches is that the learned projective functions are linear combinations of all the original features, thus it is often difficult to interpret the results. In this paper, we propose a novel dimensionality reduction framework, called Unified Sparse Subspace Learning (USSL), for learning sparse projections. USSL casts the problem of learning the projective functions into a regression framework, which facilitates the use of different kinds of regularizers. By using a L1norm regularizer (lasso), the sparse projections can be efficiently computed. Experimental results on real world classification and clustering problems demonstrate the effectiveness of our method.
Locally Consistent Concept Factorization for Document Clustering
, 2011
"... Previous studies have demonstrated that document clustering performance can be improved significantly in lower dimensional linear subspaces. Recently, matrix factorization based techniques, such as Nonnegative Matrix Factorization (NMF) and Concept Factorization (CF), have yielded impressive result ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
Previous studies have demonstrated that document clustering performance can be improved significantly in lower dimensional linear subspaces. Recently, matrix factorization based techniques, such as Nonnegative Matrix Factorization (NMF) and Concept Factorization (CF), have yielded impressive results. However, both of them effectively see only the global Euclidean geometry, whereas the local manifold geometry is not fully considered. In this paper, we propose a new approach to extract the document concepts which are consistent with the manifold geometry such that each concept corresponds to a connected component. Central to our approach is a graph model which captures the local geometry of the document submanifold. Thus we call it Locally Consistent Concept Factoriaztion (LCCF). By using the graph Laplacian to smooth the documenttoconcept mapping, LCCF can extract concepts with respect to the intrinsic manifold structure and thus documents associated with the same concept can be well clustered. The experimental results on TDT2 and Reuters21578 have shown that the proposed approach provides a better representation and achieves better clustering results in terms of accuracy and mutual information.
Regularized Locality Preserving Indexing via Spectral Regression
 Proc. 16th ACM Int’l Conf. Information and Knowledge Management (CIKM ’07
, 2007
"... We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing (LSI) which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing (LSI) which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is not efficient in time and memory which makes it difficult to be applied to very large data set. Specifically, the computation of LPI involves eigendecompositions of two dense matrices which is expensive. In this paper, we propose a new algorithm called Regularized Locality Preserving Indexing (RLPI). Benefit from recent progresses on spectral graph analysis, we cast the original LPI algorithm into a regression framework which enable us to avoid eigendecomposition of dense matrices. Also, with the regression based framework, different kinds of regularizers can be naturally incorporated into our algorithm which makes it more flexible. Extensive experimental results show that RLPI obtains similar or better results comparing to LPI and it is significantly faster, which makes it an efficient and effective data preprocessing method for large scale text clustering, classification and retrieval.
Locality Preserving Nonnegative Matrix Factorization
"... Matrix factorization techniques have been frequently applied in information processing tasks. Among them, Nonnegative Matrix Factorization (NMF) have received considerable attentions due to its psychological and physiological interpretation of naturally occurring data whose representation may be pa ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Matrix factorization techniques have been frequently applied in information processing tasks. Among them, Nonnegative Matrix Factorization (NMF) have received considerable attentions due to its psychological and physiological interpretation of naturally occurring data whose representation may be partsbased in human brain. On the other hand, from geometric perspective the data is usually sampled from a low dimensional manifold embedded in high dimensional ambient space. One hopes then to find a compact representation which uncovers the hidden topics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Locality Preserving Nonnegative Matrix Factorization (LPNMF), for this purpose. For two data points, we use KLdivergence to evaluate their similarity on the hidden topics. The optimal maps are obtained such that the feature values on hidden topics are restricted to be nonnegative and vary smoothly along the geodesics of the data manifold. Our empirical study shows the encouraging results of the proposed algorithm in comparisons to the stateoftheart algorithms on two large highdimensional databases. 1