#### DMCA

## Fast rank-2 nonnegative matrix factorization for hierarchical document clustering (2013)

Venue: | IN: KDD ’13: PROC. OF THE 19TH ACM INT. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING |

Citations: | 4 - 3 self |

### Citations

1855 | H.: Introduction to Information Retrieval
- Manning, Raghavan, et al.
- 2008
(Show Context)
Citation Context ...ear coefficients that represent xi in the space spanned by the columns of W . Nonnegative data frequently occur in modern data analysis, such as text corpus when represented as a term-document matrix =-=[19]-=-. Because the lower rank factors W and H contain only nonnegative elements which stay in the original domain of the data points, NMF often produces basis vectors that facilitate better interpretation ... |

1663 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context .... Because the lower rank factors W and H contain only nonnegative elements which stay in the original domain of the data points, NMF often produces basis vectors that facilitate better interpretation =-=[16]-=-. NMF has shown excellent performances as a clustering method in numerous applications [24, 10]. When NMF is used as a clustering method, the columns of W are interpreted as k cluster representatives,... |

925 |
Solving Least-Squares Problems
- Lawson, Hanson
- 1974
(Show Context)
Citation Context ...the NNLS problem and can be categorized into standard optimization algorithms and active-set-type algorithms. A classical active-set algorithm for NNLS with a single right-hand side was introduced in =-=[15]-=-. In the context of NMF, Lin [18] claimed that it would be expensive to solve NNLS with multiple right-hand sides using the active-set algorithm repeatedly, and proposed a projected gradient descent (... |

690 | Cumulated gain-based evaluation of IR techniques
- Jarvelin, Kekalainen
- 2002
(Show Context)
Citation Context ...ts two potential children, L and R. We also expect that N receives a low score if the top words for L and R are almost the same. We utilize the concept of normalized discounted cumulative gain (NDCG) =-=[7]-=- from the information retrieval community. Given a perfect ranked list, NDCG measures the quality of an actual ranked list which always has value between 0 and 1. A leaf node N in our tree is associat... |

663 | Rcv1: A new benchmark collection for text categorization research
- Lewis, Yang, et al.
- 2004
(Show Context)
Citation Context ...tructures in the data. For example, a tree structure often provides a more detailed taxonomy or a better description of natural phenomena than a flat partitioning. In the widely-used text corpus RCV1 =-=[17]-=-, a hierarchy of topics was defined, with 103 leaf nodes under four super categories (Corporate/Industrial, Economics, Government/Social, Markets). Online retailers such as Amazon and eBay also mainta... |

319 |
Document clustering based on non-negative matrix factorization
- Xu, Liu, et al.
- 2003
(Show Context)
Citation Context ... document clustering, nonnegative matrix factorization, rank-2 NMF 1. INTRODUCTION Nonnegative matrix factorization (NMF) has received wide recognition in many data mining areas such as text analysis =-=[24]-=-. In NMF, given a nonnegative matrix X ∈ Rm×n+ and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are n... |

286 | Hierarchical topic models and the nested Chinese restaurant process
- Blei, Griffiths, et al.
- 2004
(Show Context)
Citation Context ... document clustering. Our methodology is able to determine both the tree structure and the depth of the tree on-the-fly and detect outliers, in contrast to hierarchical probabilistic modeling methods =-=[2]-=- that require the depth of the tree be specified by the user. • We present promising empirical results of our methodology in terms of efficiency, clustering quality, as well as semantic quality in the... |

275 | Projected gradient methods for nonnegative matrix factorization
- LIN
- 2007
(Show Context)
Citation Context ... norm, and ‖ · ‖F denotes the Frobenius norm. 2. ALTERNATINGNONNEGATIVELEAST SQUARES FOR NMF In this paper, we consider the algorithms for NMF that fit into the two-block coordinate descent framework =-=[18, 11, 13, 12]-=- due to better theoretical guarantee in convergence properties. In this framework, starting from some initialization, the matrices W and H are updated in an iterative manner, until some stopping crite... |

246 |
Recent trends in hierarchical document clustering: A critical review”.
- Willett
- 1988
(Show Context)
Citation Context ...therwise. More study is needed to understand the benefits of each method in terms of topic coherence. 7. CONCLUSION Hierarchical document clustering has a rich history in data analysis and management =-=[23]-=-. In this paper, we considered the divisive approach, which splits a data set in the top-down fashion and offers a global view of the data set compared to agglomerative clustering methods. In divisive... |

208 | Automating the construction of internet portals with machine learning
- McCallum, Nigam, et al.
- 2000
(Show Context)
Citation Context ...as a defined hierarchy of 3 levels. Unlike previous indexing, we observed that many articles have duplicated paragraphs due to cross-referencing. We discarded cited paragraphs and signatures. 3. Cora =-=[20]-=- is a collection of research papers in computer science, from which we extracted the title, abstract, and referencecontexts. Although this data set defines a topic hierarchy of 3 levels, we observed t... |

172 |
Cluto - a clustering toolkit
- Karypis
- 2002
(Show Context)
Citation Context ... as follows3: 1http://www.daviddlewis.com/resources/ testcollections/reuters21578/ 2http://qwone.com/~jason/20Newsgroups/ 3We also compared our method with the off-the-shelf clustering software CLUTO =-=[9]-=-. In most cases, our method is faster than CLUTO configured by default, with comparable clustering quality. When both methods are terminated after 10 iterations, our method costs 104 seconds on RCV1-f... |

171 |
Metagenes and molecular pattern discovery using matrix factorization
- Brunet, Tamayo, et al.
- 2004
(Show Context)
Citation Context ...bedding of the data points. Selecting the number of clusters k is an important and difficult issue in practice. Though model selection methods for selecting k have been proposed in the context of NMF =-=[4, 8]-=-, it is expensive to compute solutions of NMF for each k in general [12]. In the NMF-based hierarchical approach we propose in this paper, a data set is recursively divided into small subsets and the ... |

89 | Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis
- Kim, Park
- 2007
(Show Context)
Citation Context ...e original domain of the data points, NMF often produces basis vectors that facilitate better interpretation [16]. NMF has shown excellent performances as a clustering method in numerous applications =-=[24, 10]-=-. When NMF is used as a clustering method, the columns of W are interpreted as k cluster representatives, and the i-th column of H contains fractional assignment values of the i-th data point for the ... |

84 |
On the convergence of the block nonlinear Gauss–Seidel method under convex constraints
- GRIPPO, SCIANDRONE
- 2000
(Show Context)
Citation Context ...in W≥0 ‖HTWT −XT ‖2F , (2) min H≥0 ‖WH −X‖2F . (3) When an optimal solution is obtained for each subproblem in each iteration, this iterative procedure is guaranteed to converge to a stationary point =-=[6]-=-, which is a good convergence property for nonconvex problems such as (1). Each subproblem is a nonnegative least squares problem (NNLS) with multiple right-hand sides. Consider the following generic ... |

83 | Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method
- KIM, PARK
- 2008
(Show Context)
Citation Context .... k ≤ min(m,n), X is approximated by a product of two nonnegative matrices W ∈ Rm×k+ and H ∈ Rk×n+ . A common way to define NMF is to use the Frobenius norm to measure the difference between X and WH =-=[11]-=-: min W,H≥0 ‖X −WH‖2F , (1) where ‖ · ‖F denotes the Frobenius norm. The columns of X = [x1, · · · ,xn] represent n nonnegative data points in the m-dimensional space. Typically k << min(m,n), i.e. th... |

80 | Optimizing semantic coherence in topic models.
- Mimno, Wallach, et al.
- 2011
(Show Context)
Citation Context ...el. For hierarchical clustering following Algorithm 3, we treat all the outliers as one separate cluster for fair evaluation. 2. Coherence: This is a measure of intra-topic similarity in topic models =-=[21, 1]-=-. Given the top words f1, · · · , fK for a topic, coherence is computed as coherence = K∑ i=1 K∑ j=i ( log D(fi, fj) + µ D(fi) ) , (18) where D(fi) is the document frequency of fi. D(fi, fj) is the nu... |

45 | A Practical Algorithm for Topic Modeling with Provable Guarantees.
- Arora, Ge, et al.
- 2013
(Show Context)
Citation Context ...n the same way as the standard NDCG measure, but with a modified gain function. Also note that ĝi instead of g(fi) is used in computing the ideal mDCG (mIDCG) so that mNDCG always has a value in the =-=[0, 1]-=- interval. Finally, the score of the leaf node N is computed as: score(N ) = mNDCG(fL)×mNDCG(fR). (17) To illustrate the effectiveness of this scoring function, let us consider some typical cases. 1. ... |

39 | Cluster merging and splitting in hierarchical clustering algorithms”,
- Ding, He
- 2002
(Show Context)
Citation Context ...-separated clusters based on the two basis vectors generated by rank-2 NMF before deciding which one to split. Compared to existing strategies that rely on an n× n document-document similarity matrix =-=[5]-=-, our methodology never generates a large dense matrix thus is more time/space efficient. Although the rank-2 NMF computation on any leaf node in the final tree is wasted, our methodology is still ver... |

39 |
Fast algorithm for the solution of large-scale nonnegativity-constrained least squares problems,
- Benthem, Keenan
- 2004
(Show Context)
Citation Context ...ency of solving NNLS with multiple right-hand sides (4), the columns of G with the same active set pattern are grouped together for lower computational complexity and more cache-efficient computation =-=[11, 22]-=-, and the grouping of columns changes when the active set is re-identified in each iteration of NNLS. Practically, the grouping step is implemented as a sorting of the columns of G, with complexity O(... |

38 | H (2008) Toward faster nonnegative matrix factorization: a new algorithm and comparisons
- Kim, Park
(Show Context)
Citation Context ...We will exploit the special properties of NMF with k = 2, and propose a very fast algorithm. We will study a particular type of existing algorithms for standard NMF, namely active-set-type algorithms =-=[11, 13]-=-, and show that when k = 2, active-set-type algorithms can be reduced to a simple and efficient algorithm for rank-2 NMF, which has additional benefits when implemented on parallel platforms due to “n... |

22 |
A decision criterion for the optimal number of clusters in hierarchical clustering
- Jung, Park, et al.
- 2003
(Show Context)
Citation Context ...bedding of the data points. Selecting the number of clusters k is an important and difficult issue in practice. Though model selection methods for selecting k have been proposed in the context of NMF =-=[4, 8]-=-, it is expensive to compute solutions of NMF for each k in general [12]. In the NMF-based hierarchical approach we propose in this paper, a data set is recursively divided into small subsets and the ... |

21 | H (2012) Symmetric nonnegative matrix factorization for graph clustering
- Kuang, Ding, et al.
(Show Context)
Citation Context ...s to the largest element in the i-th column of H. This clustering scheme has been shown to achieve superior clustering quality, and many variations such as constrained clustering and graph clustering =-=[10, 14]-=- have been proposed. The standard NMF (1) has been shown to perform especially well as a document clustering method, where the columns of W can be interpreted as k topics extracted from a corpus [24, ... |

17 | H (2014) Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework - Kim, He, et al. |