## Orthogonal nonnegative matrix tri-factorizations for clustering (2006)

Venue: | In SIGKDD |

Citations: | 74 - 18 self |

### BibTeX

@INPROCEEDINGS{Ding06orthogonalnonnegative,

author = {Chris Ding},

title = {Orthogonal nonnegative matrix tri-factorizations for clustering},

booktitle = {In SIGKDD},

year = {2006},

pages = {126--135},

publisher = {Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Currently, most research on nonnegative matrix factorization (NMF) focus on 2-factor X = FG T factorization. We provide a systematic analysis of 3-factor X = FSG T NMF. While unconstrained 3-factor NMF is equivalent to unconstrained 2-factor NMF, constrained 3factor NMF brings new features to constrained 2-factor NMF. We study the orthogonality constraint because it leads to rigorous clustering interpretation. We provide new rules for updating F,S,G and prove the convergence of these algorithms. Experiments on 5 datasets and a real world case study are performed to show the capability of bi-orthogonal 3-factor NMF on simultaneously clustering rows and columns of the input data matrix. We provide a new approach of evaluating the quality of clustering on words using class aggregate distribution and multi-peak distribution. We also provide an overview of various NMF extensions and examine their relationships.

### Citations

1094 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...sions [3, 5, 15, 20, 27, 32]. This is also extended to classification [30]. NMF can be traced back to 1970s (Notes from G. Golub) and is studied extensively by Paatero [27]. The work of Lee and Seung =-=[18, 19]-=- brought much attention to NMF in machine learning and data mining fields. They suggest that NMF factors contain coherent parts of the original data (images). They emphasize the difference between NMF... |

574 | Using linear algebra for intelligent information retrieval
- Berry, Dumais, et al.
- 1996
(Show Context)
Citation Context ... using Laplacian matrix has been studied in [7, 35]. NMF is one type of matrix factorizations. There are other types of factorizations [6, 17, 33, 24, 23, 21]. Others include Latent Semantic Indexing =-=[1]-=-, scaled PCA [10], generalized SVD [28], etc. Here are some more notations. We often write � F = (f1,··· ,fk) and G = (g1,··· ,gk). The matrix norm �A� = ∑i j a2 i j . In the following, we emphasize t... |

573 | Probabilistic latent semantic analysis
- Hofmann
- 1999
(Show Context)
Citation Context ...nction we used for computing NMF is the sum of squared errors, ||X − FGT || 2 . Another cost function KL divergence: m n � Xi j JNMF-KL = ∑ ∑ Xi j log i=1 j=1 (FGT − Xi j +(FG )i j T � )i j (53) PLSI =-=[14]-=- maximizes the likelihood m n JPLSI = ∑ ∑ X(wi,dj)logP(wi,dj) (54) i=1 j=1 where the joint occurrence probability is factorized (i.e., parameterized or approximated ) as P(wi,dj) = ∑P(wi|zk)P(zk)P(d j... |

496 |
Objective criteria for the evaluation of clustering methods
- Rand
- 1971
(Show Context)
Citation Context ...is defined as the number of pairs of objects which are both located in the same cluster and the same class, or both in different clusters and different classes, divided by the total number of objects =-=[29]-=-. Adjusted Rand Index which adjusts Rand Index is set between [0,1] [26]. The higher the Adjusted Rand Index, the more resemblance between the clustering results and the labels. 9.3 Document Clusterin... |

335 | Co-clustering documents and words using bipartite spectral graph partitioning
- Dhillon
- 2001
(Show Context)
Citation Context ...st cases, we set k = ℓ. This form gives a good framework for simultaneously clustering the rows and columns of X. Simultaneously rows and columns clustering using Laplacian matrix has been studied in =-=[7, 35]-=-. NMF is one type of matrix factorizations. There are other types of factorizations [6, 17, 33, 24, 23, 21]. Others include Latent Semantic Indexing [1], scaled PCA [10], generalized SVD [28], etc. He... |

327 | Concept decompositions for large sparse text data using clustering
- Dhillon, Modha
- 2001
(Show Context)
Citation Context ...rows and columns of X. Simultaneously rows and columns clustering using Laplacian matrix has been studied in [7, 35]. NMF is one type of matrix factorizations. There are other types of factorizations =-=[6, 17, 33, 24, 23, 21]-=-. Others include Latent Semantic Indexing [1], scaled PCA [10], generalized SVD [28], etc. Here are some more notations. We often write � F = (f1,··· ,fk) and G = (g1,··· ,gk). The matrix norm �A� = ∑... |

325 | Non-negative matrix factorization with sparseness constraints
- Hoyer
- 2004
(Show Context)
Citation Context ...omputer Science Florida International University Miami, FL 33199 taoli,wpeng002@cs.fiu.edu Haesun Park College of Computing Georgia Institute of Technology Atlanta, GA 30332 hpark@cc.gatech.edu sions =-=[3, 5, 15, 20, 27, 32]-=-. This is also extended to classification [30]. NMF can be traced back to 1970s (Notes from G. Golub) and is studied extensively by Paatero [27]. The work of Lee and Seung [18, 19] brought much attent... |

298 |
Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values
- Paatero, Tapper
- 1994
(Show Context)
Citation Context ...omputer Science Florida International University Miami, FL 33199 taoli,wpeng002@cs.fiu.edu Haesun Park College of Computing Georgia Institute of Technology Atlanta, GA 30332 hpark@cc.gatech.edu sions =-=[3, 5, 15, 20, 27, 32]-=-. This is also extended to classification [30]. NMF can be traced back to 1970s (Notes from G. Golub) and is studied extensively by Paatero [27]. The work of Lee and Seung [18, 19] brought much attent... |

201 |
Document clustering based on non-negative matrix factorization
- Xu, Liu, et al.
- 2003
(Show Context)
Citation Context ...omputer Science Florida International University Miami, FL 33199 taoli,wpeng002@cs.fiu.edu Haesun Park College of Computing Georgia Institute of Technology Atlanta, GA 30332 hpark@cc.gatech.edu sions =-=[3, 5, 15, 20, 27, 32]-=-. This is also extended to classification [30]. NMF can be traced back to 1970s (Notes from G. Golub) and is studied extensively by Paatero [27]. The work of Lee and Seung [18, 19] brought much attent... |

163 |
A toolkit for statistical language modeling, text retrieval, classi and clustering. http://www.cs.cmu.edu/ mccallum/bow
- Bow
- 1996
(Show Context)
Citation Context ...nization of the posted articles are ignored. In all our experiments, we first select the top 1000 words by mutual information with class labels. The feature selection is done with the rainbow package =-=[25]-=-. 9.2 Evaluation Measures The above document datasets are standard labeled corpora widely used in the information retrieval literature. We view the labels of the datasets as the objective knowledge on... |

161 | Spectral relaxation for k-means clustering
- Zha, Ding, et al.
- 2001
(Show Context)
Citation Context ...dentical to K-means clustering. � We note that Theorem 1 holds even if X and F are not nonnegative, i.e., X and F have mixed-sign entries. This motives generalizing NMF to semi-NMF in §11.1. Theorem 2=-=[8, 34]-=-. The K-means clustering minimizes κ J = ∑ ∑ �xi − ck� k=1 i∈Ck 2 κ n = ∑ ∑ Gik�xi − ck� k=1 i=1 2 where ck is the cluster centroid of the k-th cluster. More generally, the Kernel K-means with mapping... |

136 | Learning spatially localized, parts-based representation
- Li, Hou, et al.
(Show Context)
Citation Context |

100 |
Metagenes and molecular pattern discovery using matrix factorization
- Brunet, Tamayo, et al.
(Show Context)
Citation Context |

92 | Bipartite graph partitioning and data clustering
- Zha, He, et al.
- 2001
(Show Context)
Citation Context ...st cases, we set k = ℓ. This form gives a good framework for simultaneously clustering the rows and columns of X. Simultaneously rows and columns clustering using Laplacian matrix has been studied in =-=[7, 35]-=-. NMF is one type of matrix factorizations. There are other types of factorizations [6, 17, 33, 24, 23, 21]. Others include Latent Semantic Indexing [1], scaled PCA [10], generalized SVD [28], etc. He... |

90 | On the equivalence of nonnegative matrix factorization and spectral clustering
- Ding, He, et al.
- 2005
(Show Context)
Citation Context ...o not support the coherent part interpretation of NMF. In fact, most applications make use of the clustering aspect of NMF, which is de-emphasized by Lee and Seung [18]. A recent theoretical analysis =-=[9]-=- shows the equivalence between NMF and K-means / spectral clustering. Below we briefly outline NMF which provides notations and further motivations. In general, NMF factorizes input nonnegative data m... |

87 | Empirical and Theoretical Comparison of Selected Criterion Functions for Document Clustering
- Zhao, Karypis
(Show Context)
Citation Context ...dely used in the information retrieval literature. We view the labels of the datasets as the objective knowledge on the structure of the datasets. To measure the clustering performance, we use purity =-=[36]-=- and Adjusted Rand Index (ARI) [26] as our performance measures. We expect these measures would provide us with good insights on how our algorithm works. Purity measures the extent to which each clust... |

76 | Webace: A web agent for document categorization and exploration
- Han, Boley, et al.
- 1998
(Show Context)
Citation Context ...of the data collection which includes the 10 most frequent categories among the 135 topics and we call it Reuters-top 10. WebAce. This is from WebACE project and has been used for document clustering =-=[2, 13]-=-. The dataset contains 2340 documents consisting news articles from Reuters new service via the Web in October 1997. These documents are divided into 20 classes. Newsgroups. The 20 newsgroups dataset ... |

75 | Concept Indexing: a Fast Dimensionality Reduction Algorithm with Applications to Document Retrieval and Categorization
- Karypis, Han
- 2000
(Show Context)
Citation Context ...rows and columns of X. Simultaneously rows and columns clustering using Laplacian matrix has been studied in [7, 35]. NMF is one type of matrix factorizations. There are other types of factorizations =-=[6, 17, 33, 24, 23, 21]-=-. Others include Latent Semantic Indexing [1], scaled PCA [10], generalized SVD [28], etc. Here are some more notations. We often write � F = (f1,··· ,fk) and G = (g1,··· ,gk). The matrix norm �A� = ∑... |

74 |
A study of the comparability of external criteria for hierarchical cluster analysis
- Milligan, Cooper
- 1986
(Show Context)
Citation Context ...val literature. We view the labels of the datasets as the objective knowledge on the structure of the datasets. To measure the clustering performance, we use purity [36] and Adjusted Rand Index (ARI) =-=[26]-=- as our performance measures. We expect these measures would provide us with good insights on how our algorithm works. Purity measures the extent to which each cluster contained data points from prima... |

62 | Multiplicative updates for nonnegative quadratic programming in support vector machines
- Sha, Saul, et al.
- 2002
(Show Context)
Citation Context ... taoli,wpeng002@cs.fiu.edu Haesun Park College of Computing Georgia Institute of Technology Atlanta, GA 30332 hpark@cc.gatech.edu sions [3, 5, 15, 20, 27, 32]. This is also extended to classification =-=[30]-=-. NMF can be traced back to 1970s (Notes from G. Golub) and is studied extensively by Paatero [27]. The work of Lee and Seung [18, 19] brought much attention to NMF in machine learning and data mining... |

58 | Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition
- Howland, Park
- 2004
(Show Context)
Citation Context ...ed in [7, 35]. NMF is one type of matrix factorizations. There are other types of factorizations [6, 17, 33, 24, 23, 21]. Others include Latent Semantic Indexing [1], scaled PCA [10], generalized SVD =-=[28]-=-, etc. Here are some more notations. We often write � F = (f1,··· ,fk) and G = (g1,··· ,gk). The matrix norm �A� = ∑i j a2 i j . In the following, we emphasize the benefit of orthogonality for the uni... |

41 |
Principal direction divisive partitioning. Data Mining and Knowledge Discovery
- Boley
- 1998
(Show Context)
Citation Context ...of the data collection which includes the 10 most frequent categories among the 135 topics and we call it Reuters-top 10. WebAce. This is from WebACE project and has been used for document clustering =-=[2, 13]-=-. The dataset contains 2340 documents consisting news articles from Reuters new service via the Web in October 1997. These documents are divided into 20 classes. Newsgroups. The 20 newsgroups dataset ... |

39 |
Towards Informatic Analysis of Syslogs
- Stearley
(Show Context)
Citation Context ...ng log message files for computing system management provides a successful story of applying the cluster model in real applications. The log messages are relatively short with a large vocabulary size =-=[31]-=-. Hence they are usually represented as sparse high-dimensional vectors. In addition, the log generation mechanisms implicitly create some associations between the terminologies and the situations. Ou... |

37 |
A general model for clustering binary data
- Li
- 2005
(Show Context)
Citation Context ...rows and columns of X. Simultaneously rows and columns clustering using Laplacian matrix has been studied in [7, 35]. NMF is one type of matrix factorizations. There are other types of factorizations =-=[6, 17, 33, 24, 23, 21]-=-. Others include Latent Semantic Indexing [1], scaled PCA [10], generalized SVD [28], etc. Here are some more notations. We often write � F = (f1,··· ,fk) and G = (g1,··· ,gk). The matrix norm �A� = ∑... |

33 | Co-clustering by block value decomposition
- Long, Zhang, et al.
- 2005
(Show Context)
Citation Context |

29 | Document clustering via adaptive subspace iteration
- Li, Ma, et al.
(Show Context)
Citation Context |

28 | Summarizing video using nonnegative similarity matrix factorization
- Cooper, Foote
- 2002
(Show Context)
Citation Context |

20 |
Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method
- Ding, Li, et al.
- 2006
(Show Context)
Citation Context ...likelihood m n JPLSI = ∑ ∑ X(wi,dj)logP(wi,dj) (54) i=1 j=1 where the joint occurrence probability is factorized (i.e., parameterized or approximated ) as P(wi,dj) = ∑P(wi|zk)P(zk)P(d j|zk) (55) ksIn =-=[12]-=-, it is shown that Objective function of PLSI is identical to the objective function of NMF, i.e., J PLSI = −J NMF-KL + constant by setting (FG T )i j = P(wi,dj). Therefore, the NMF update algorithm a... |

18 |
An integrated framework on mining log files for computing system management
- Li, Liang, et al.
- 2005
(Show Context)
Citation Context ...mponents when problems occur, we need to cluster the log messages with disparate formats to automatically infer the common set of semantic situations and obtain a brief description for each situation =-=[22]-=-. The log files used in our experiments are collected from several different machines with different operating systems using logdump2td (NT data collection tool) developed at IBM T.J. Watson Research ... |

17 |
K-means clustering and principal component analysis
- Ding, He
- 2004
(Show Context)
Citation Context ...dentical to K-means clustering. � We note that Theorem 1 holds even if X and F are not nonnegative, i.e., X and F have mixed-sign entries. This motives generalizing NMF to semi-NMF in §11.1. Theorem 2=-=[8, 34]-=-. The K-means clustering minimizes κ J = ∑ ∑ �xi − ck� k=1 i∈Ck 2 κ n = ∑ ∑ Gik�xi − ck� k=1 i=1 2 where ck is the cluster centroid of the k-th cluster. More generally, the Kernel K-means with mapping... |

14 | CLSI: a flexible approximation scheme from clustered term-document matrices
- Zeimpekis, Gallopoulos
- 2004
(Show Context)
Citation Context |

10 |
Unsupervised learning: Self-aggregation in scaled principal component space. LNCS
- Ding, Zha, et al.
- 2002
(Show Context)
Citation Context ... matrix has been studied in [7, 35]. NMF is one type of matrix factorizations. There are other types of factorizations [6, 17, 33, 24, 23, 21]. Others include Latent Semantic Indexing [1], scaled PCA =-=[10]-=-, generalized SVD [28], etc. Here are some more notations. We often write � F = (f1,··· ,fk) and G = (g1,··· ,gk). The matrix norm �A� = ∑i j a2 i j . In the following, we emphasize the benefit of ort... |

10 |
Convex and semi-nonnegative matrix factorization for clustering and low-dimension representation
- Ding, Li, et al.
- 2006
(Show Context)
Citation Context ...n if X have mixed-signs, we could enforce G to be Table 6: Word Multi-Peak Distributions for Log Data nonnegative (since G can be interpreted as cluster indicators, as in §3). This is called semi-NMF =-=[11]-=-: semi-NMF: X± ≈ F±G+ Theorem 1 provides the basis for this semi-NMF formulation. Both NMF and semi-NMF have clustering capabilities which are generally better than the K-means. In fact, PCA is effect... |

9 |
Algorithms for non-negatvie matrix factorization
- Lee, Seung
- 2001
(Show Context)
Citation Context ...sions [3, 5, 15, 20, 27, 32]. This is also extended to classification [30]. NMF can be traced back to 1970s (Notes from G. Golub) and is studied extensively by Paatero [27]. The work of Lee and Seung =-=[18, 19]-=- brought much attention to NMF in machine learning and data mining fields. They suggest that NMF factors contain coherent parts of the original data (images). They emphasize the difference between NMF... |

7 |
Specification: Common base event
- Chessell
- 2003
(Show Context)
Citation Context ...erts [22]. The set of semantic situations include start, stop, dependency, create, connection, report, request, configuration, and other. The detailed explanations of these situations can be found in =-=[4]-=-. Algorithms Purity Entropy ARI BiOR-NM3F 0.806 0.303 0.856 K-means 0.684 0.491 0.572 Table 5: Clustering Results on System Log Data We obtain good message clustering results as shown in Table 5. The ... |